distcp

[hadoop@hadoopmaster test]$ hadoop distcp hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir
15/11/18 05:39:30 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:39:30 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:39:31 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/11/18 05:39:31 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/11/18 05:39:31 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:39:32 INFO mapreduce.JobSubmitter: number of splits:2
15/11/18 05:39:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447853441917_0001
15/11/18 05:39:32 INFO impl.YarnClientImpl: Submitted application application_1447853441917_0001
15/11/18 05:39:33 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1447853441917_0001/
15/11/18 05:39:33 INFO tools.DistCp: DistCp job-id: job_1447853441917_0001
15/11/18 05:39:33 INFO mapreduce.Job: Running job: job_1447853441917_0001
15/11/18 05:39:41 INFO mapreduce.Job: Job job_1447853441917_0001 running in uber mode : false
15/11/18 05:39:41 INFO mapreduce.Job: map 0% reduce 0%
15/11/18 05:39:48 INFO mapreduce.Job: map 50% reduce 0%
15/11/18 05:39:50 INFO mapreduce.Job: map 100% reduce 0%
15/11/18 05:39:50 INFO mapreduce.Job: Job job_1447853441917_0001 completed successfully
15/11/18 05:39:50 INFO mapreduce.Job: Counters: 33
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=216204
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1220
                HDFS: Number of bytes written=24
                HDFS: Number of read operations=31
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=8
        Job Counters
                Launched map tasks=2
                Other local map tasks=2
                Total time spent by all maps in occupied slots (ms)=10356
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=10356
                Total vcore-seconds taken by all map tasks=10356
                Total megabyte-seconds taken by all map tasks=10604544
        Map-Reduce Framework
                Map input records=3
                Map output records=0
                Input split bytes=272
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=156
                CPU time spent (ms)=1320
                Physical memory (bytes) snapshot=342798336
                Virtual memory (bytes) snapshot=1753182208
                Total committed heap usage (bytes)=169869312
        File Input Format Counters
                Bytes Read=924
        File Output Format Counters
                Bytes Written=0
        org.apache.hadoop.tools.mapred.CopyMapper$Counter
                BYTESCOPIED=24
                BYTESEXPECTED=24
                COPY=3
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2015-11-18 05:39 /jacktest/todir
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2015-11-18 05:39 /jacktest/todir/jacktest.db
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir/jacktest.db
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2015-11-18 05:39 /jacktest/todir/jacktest.db/test1
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir/jacktest.db/test1
Found 1 items
-rw-r--r--   1 hadoop supergroup         24 2015-11-18 05:39 /jacktest/todir/jacktest.db/test1/test.body
[hadoop@hadoopmaster test]$ hadoop fs -cat /jacktest/todir/jacktest.db/test1/test.body
1,jack
2,josson
3,gavin
[hadoop@hadoopmaster test]$

hive> create table test1(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.454 seconds
hive> select * from test1;
OK
Time taken: 0.65 seconds
hive> show create table test1;
OK
CREATE TABLE `test1`(
`id` int,
`name` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db/test1'
TBLPROPERTIES (
'transient_lastDdlTime'='1447853584')
Time taken: 0.152 seconds, Fetched: 13 row(s)

[hadoop@hadoopmaster test]$ vi test.body

1,jack
2,josson
3,gavin

关于协议
如果两个集群间的版本不一致，那么使用hdfs可能就会产生错误，因为rpc系统不兼容。那么这时候你可以使用基于http协议的hftp协议，但目标地址还必须是hdfs的，象这样：
hadoop distcp hftp://namenode:50070/user/hadoop/input hdfs://namenode:9000/user/hadoop/input1
推荐用hftp的替代协议webhdfs，源地址和目标地址都可以使用webhdfs，可以完全兼容

hadoop distcp hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1

[hadoop@hadoopmaster test]$ hadoop fs -mkdir /jacktest/todir1
[hadoop@hadoopmaster test]$ hadoop distcp hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
15/11/18 05:44:32 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir1, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:44:32 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:44:33 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException: hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db doesn't exist
        at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
        at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
[hadoop@hadoopmaster test]$ hadoop distcp hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
15/11/18 05:45:10 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir1, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:45:10 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:45:11 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/11/18 05:45:11 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/11/18 05:45:11 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:45:11 INFO mapreduce.JobSubmitter: number of splits:2
15/11/18 05:45:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447853441917_0002
15/11/18 05:45:11 INFO impl.YarnClientImpl: Submitted application application_1447853441917_0002
15/11/18 05:45:12 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1447853441917_0002/
15/11/18 05:45:12 INFO tools.DistCp: DistCp job-id: job_1447853441917_0002
15/11/18 05:45:12 INFO mapreduce.Job: Running job: job_1447853441917_0002
15/11/18 05:45:18 INFO mapreduce.Job: Job job_1447853441917_0002 running in uber mode : false
15/11/18 05:45:18 INFO mapreduce.Job: map 0% reduce 0%
15/11/18 05:45:24 INFO mapreduce.Job: map 50% reduce 0%
15/11/18 05:45:26 INFO mapreduce.Job: map 100% reduce 0%
15/11/18 05:45:26 INFO mapreduce.Job: Job job_1447853441917_0002 completed successfully
15/11/18 05:45:26 INFO mapreduce.Job: Counters: 38
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=216208
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1200
                HDFS: Number of bytes written=24
                HDFS: Number of read operations=25
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=8
                HFTP: Number of bytes read=0
                HFTP: Number of bytes written=0
                HFTP: Number of read operations=0
                HFTP: Number of large read operations=0
                HFTP: Number of write operations=0
        Job Counters
                Launched map tasks=2
                Other local map tasks=2
                Total time spent by all maps in occupied slots (ms)=10014
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=10014
                Total vcore-seconds taken by all map tasks=10014
                Total megabyte-seconds taken by all map tasks=10254336
        Map-Reduce Framework
                Map input records=3
                Map output records=0
                Input split bytes=272
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=104
                CPU time spent (ms)=2240
                Physical memory (bytes) snapshot=345600000
                Virtual memory (bytes) snapshot=1751683072
                Total committed heap usage (bytes)=169869312
        File Input Format Counters
                Bytes Read=928
        File Output Format Counters
                Bytes Written=0
        org.apache.hadoop.tools.mapred.CopyMapper$Counter
                BYTESCOPIED=24
                BYTESEXPECTED=24
                COPY=3
[hadoop@hadoopmaster test]$

猜你喜欢