版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/weixin_43215250/article/details/90643968
用法:
hadoop distcp OPTIONS [source_path...] <target_path>
参数 | 描述 |
---|---|
-append | 重用目标文件中的现有数据,并在可能的情况下添加新数据 |
-atomic | 提交所有更改或不提交更改 |
-bandwidth | 以MB为单位指定每个映射的带宽 |
-blocksperchunk | 如果将该值设置为正值,则包含比该值更多块的文件将被分割为多个块,以便并行传输,并在目标上重新组装。默认情况下,值为0,文件将被完整地传输,而不会被分割。此切换仅适用于源文件系统实现getBlockLocations方法和目标文件系统mplements concat方法时 |
-delete | 从目标中删除源中丢失的文件 |
-diff | 使用snapshot diff报告来识别源和目标之间的差异 |
-f | 需要复制的文件列表 |
-filelimit | (弃用)将复制的文件数量限制为 <= n |
-filters | 从复制的文件列表中排除 |
-i | 忽略复制过程中的失败 |
-log | 保存distcp执行日志的路径 |
-m | map 最大数量 |
-mapredSslConf | ssl配置文件的配置,必须在类路径中使用hftps:// |
-numListstatusThreads | 用于构建文件清单的线程数(最多40个) |
-overwrite | 选择无条件地覆盖目标文件,即使它们存在。 |
-p | 保存状态(rbugpcaxt)(复制、块大小、用户、组、权限、校验和类型、ACL、XATTR、时间戳) |
-rdiff | 使用目标快照差异报告进行识别目标变更 |
-sizelimit | (弃用)限制复制的文件数量 <= n字节 |
-skipcrccheck | 是否跳过源和目标路径之间的CRC检查。 |
-strategy | 复制要使用的策略。默认情况下是根据文件大小划分工作 |
-tmp | 用于原子提交的中间工作路径 |
-update | 更新目标,只复制丢失的文件或目录 |
[root@cdh01:~]# hadoop distcp hdfs://192.168.1.11:8020/tmp/hbase/test/ /tmp/hbase/test/
19/05/29 10:01:49 INFO tools.OptionsParser: parseChunkSize: blocksperchunk false
19/05/29 10:01:50 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://192.168.1.11:8020/tmp/hbase/test], targetPath=/tmp/hbase/test, targetPathExists=false, filtersFile='null', blocksPerChunk=0}
19/05/29 10:01:50 INFO client.RMProxy: Connecting to ResourceManager at cdh01/192.168.1.101:8032
19/05/29 10:01:51 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 20; dirCnt = 1
19/05/29 10:01:51 INFO tools.SimpleCopyListing: Build file listing completed.
19/05/29 10:01:51 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
19/05/29 10:01:51 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
19/05/29 10:01:51 INFO tools.DistCp: Number of paths in the copy list: 20
19/05/29 10:01:51 INFO tools.DistCp: Number of paths in the copy list: 20
19/05/29 10:01:51 INFO client.RMProxy: Connecting to ResourceManager at cdh01/192.168.1.101:8032
19/05/29 10:01:51 INFO mapreduce.JobSubmitter: number of splits:11
19/05/29 10:01:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1559091338325_0002
19/05/29 10:01:52 INFO impl.YarnClientImpl: Submitted application application_1559091338325_0002
19/05/29 10:01:52 INFO mapreduce.Job: The url to track the job: http://cdh01:8088/proxy/application_1559091338325_0002/
19/05/29 10:01:52 INFO tools.DistCp: DistCp job-id: job_1559091338325_0002
19/05/29 10:01:52 INFO mapreduce.Job: Running job: job_1559091338325_0002
19/05/29 10:01:58 INFO mapreduce.Job: Job job_1559091338325_0002 running in uber mode : false
19/05/29 10:01:58 INFO mapreduce.Job: map 0% reduce 0%
19/05/29 10:02:03 INFO mapreduce.Job: map 9% reduce 0%
19/05/29 10:02:04 INFO mapreduce.Job: map 18% reduce 0%
19/05/29 10:02:07 INFO mapreduce.Job: map 27% reduce 0%
19/05/29 10:02:08 INFO mapreduce.Job: map 36% reduce 0%
19/05/29 10:02:09 INFO mapreduce.Job: map 45% reduce 0%
19/05/29 10:02:14 INFO mapreduce.Job: map 55% reduce 0%
19/05/29 10:02:16 INFO mapreduce.Job: map 64% reduce 0%
19/05/29 10:02:18 INFO mapreduce.Job: map 82% reduce 0%
19/05/29 10:02:20 INFO mapreduce.Job: map 91% reduce 0%
19/05/29 10:02:23 INFO mapreduce.Job: map 100% reduce 0%
19/05/29 10:04:33 INFO mapreduce.Job: Job job_1559091338325_0002 completed successfully
19/05/29 10:04:34 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=1638715
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=16936022957
HDFS: Number of bytes written=16936015418
HDFS: Number of read operations=234
HDFS: Number of large read operations=0
HDFS: Number of write operations=61
Job Counters
Launched map tasks=11
Other local map tasks=11
Total time spent by all maps in occupied slots (ms)=679786
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=679786
Total vcore-milliseconds taken by all map tasks=679786
Total megabyte-milliseconds taken by all map tasks=1044151296
Map-Reduce Framework
Map input records=20
Map output records=0
Input split bytes=1265
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1804
CPU time spent (ms)=133510
Physical memory (bytes) snapshot=3378626560
Virtual memory (bytes) snapshot=33501384704
Total committed heap usage (bytes)=2952790016
File Input Format Counters
Bytes Read=6274
File Output Format Counters
Bytes Written=0
DistCp Counters
Bytes Copied=16936015418
Bytes Expected=16936015418
Files Copied=20