2020/12/15 [email protected]
HDFS常用操作指令及管理命令(FSCK、安全模式、配额)
一、常用操作指令
1.1、基本语法
hadoop fs 【】
hdfs dfs 【】
1.2、参数大全
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] <path> ...]
[-cp [-f] [-p] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]
1.3、操作实例:
-help:输出这个命令参数
bin/hdfs dfs -help rm
-ls: 显示目录信息
hadoop fs -ls /
hadoop fs -ls -R = hadoop fs -lsr
-mkdir:在hdfs上创建目录
hadoop fs -mkdir -p /user/bduser/test
-moveFromLocal从本地剪切粘贴到hdfs
hadoop fs - moveFromLocal /home/hadoop/a.txt /user/bduser/test
-moveToLocal:从hdfs剪切粘贴到本地
hadoop fs - moveToLocal /user/bduser/test /home/hadoop/a.txt
-appendToFile :追加一个文件到已经存在的文件末尾
hadoop fs -appendToFile ./hello.txt /hello.txt
-cat :显示文件内容
-tail:显示一个文件的末尾
hadoop fs -tail /weblog/access_log.1
-text:以字符形式打印一个文件的内容
hadoop fs -text /weblog/access_log.1
-chgrp 、-chmod、-chown:linux文件系统中的用法一样,修改文件所属权限
hadoop fs -chmod 666 /hello.txt
hadoop fs -chown someuser:somegrp /hello.txt
-copyFromLocal:从本地文件系统中拷贝文件到hdfs路径去
hadoop fs -copyFromLocal ./jdk.tar.gz /aaa/
-copyToLocal:从hdfs拷贝到本地
hadoop fs -copyToLocal /aaa/jdk.tar.gz
-cp :从hdfs的一个路径拷贝到hdfs的另一个路径
hadoop fs -cp /user/bduser/test/jdk.tar.gz /user/bduser/soft/jdk.tar.gz
-mv:在hdfs目录中移动文件
hadoop fs -mv /user/bduser/test/jdk.tar.gz /
-get:等同于copyToLocal,就是从hdfs下载文件到本地
hadoop fs -get /user/bduser/test/jdk.tar.gz ~/test/
-getmerge :合并下载多个文件,比如hdfs的目录 /aaa/下有多个文件:log.1, log.2,log.3,...
hadoop fs -getmerge /aaa/log.* ./log.sum
-put:等同于copyFromLocal
hadoop fs -put /opt/softwares/jdk.tar.gz /user/bduser/test//jdk.tar.gz
-rm:删除文件或文件夹
hadoop fs -rm -r /user/bduser/test/adir/
hadoop fs -rmr /user/bduser/test/bdir/
-rmdir:删除空目录
hadoop fs -rmdir /user/bduser/test/cdir
-df :统计文件系统的可用空间信息
hadoop fs -df -h /
-du统计文件夹的大小信息
hadoop fs -du -s -h /user/bduser/*
-count:统计一个指定目录下的文件节点数量
hadoop fs -count /user/bduser/
-setrep:设置hdfs中文件的副本数量
hadoop fs -setrep 3 /aaa/jdk.tar.gz
(这里设置的副本数只是记录在namenode的元数据中,是否真的会有这么多副本,还得看datanode的数量。如果只有3台设备,最多也就3个副本,只有节点数的增加到10台时,副本数才能达到10)
二、FSCK
在HDFS中,提供了fsck命令,用于检查HDFS上文件和目录的健康状态、获取文件的block信息和位置信息等。
2.1、基本语法
hdfs fsck
2.2、参数大全
[]$ hdfs fsck
Usage: hdfs fsck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-includeSnapshots] [-storagepolicies] [-blockId <blk_Id>]
<path> start checking from this path
-move move corrupted files to /lost+found
-delete delete corrupted files
-files print out files being checked
-openforwrite print out files opened for write
-includeSnapshots include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
-list-corruptfileblocks print out list of missing blocks and files they belong to
-blocks print out block report
-locations print out locations for every block
-racks print out network topology for data-node locations
-storagepolicies print out storage policy summary for the blocks
-blockId print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)
2.3、操作实例
查看文件中损坏的块(-list-corruptfileblocks)
[]$hdfs fsck /user/hadoop-twq/cmd -list-corruptfileblocks
Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2Fuser%2Fhadoop-twq%2Fcmd
The filesystem under path '/user/hadoop-twq/cmd' has 0 CORRUPT files
损坏文件的处理(-move/-delete)
将损坏的文件移动至/lost+found目录
[]$hdfs fsck /user/hadoop-twq/cmd -move
FSCK started by hadoop (auth:SIMPLE) from /172.16.212.17 for path /user/hadoop-twq/cmd at Thu Aug 13 09:36:35 CST 2015
.Status: HEALTHY
Total size: 13497058 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 13497058 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 15
Number of racks: 1
FSCK ended at Thu Aug 13 09:36:35 CST 2015 in 1 milliseconds
The filesystem under path '/user/hadoop-twq/cmd' is HEALTHY
删除有损坏数据块的文件
[]$hdfs fsck /user/hadoop-twq/cmd -delete
FSCK started by hadoop (auth:SIMPLE) from /172.16.212.17 for path /user/hadoop-twq/cmd at Thu Aug 13 09:37:58 CST 2015
.Status: HEALTHY
Total size: 13497058 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 13497058 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 15
Number of racks: 1
FSCK ended at Thu Aug 13 09:37:58 CST 2015 in 1 milliseconds
The filesystem under path '/user/hadoop-twq/cmd' is HEALTHY
检查并列出所有文件状态(-files)
[]$hdfs fsck /user/hadoop-twq/cmd -files
FSCK started by hadoop (auth:SIMPLE) from /172.16.212.17 for path /user/hadoop-twq/cmd at Thu Aug 13 09:39:38 CST 2015
/user/hadoop-twq/cmd dir
/user/hadoop-twq/cmd_SUCCESS 0 bytes, 0 block(s): OK
/user/hadoop-twq/cmdpart-00000 13583807 bytes, 1 block(s): OK
/user/hadoop-twq/cmdpart-00001 13577427 bytes, 1 block(s): OK
/user/hadoop-twq/cmdpart-00002 13588601 bytes, 1 block(s): OK
/user/hadoop-twq/cmdpart-00003 13479213 bytes, 1 block(s): OK
/user/hadoop-twq/cmdpart-00004 13497012 bytes, 1 block(s): OK
/user/hadoop-twq/cmdpart-00005 13557451 bytes, 1 block(s): OK
/user/hadoop-twq/cmdpart-00006 13580267 bytes, 1 block(s): OK
/user/hadoop-twq/cmdpart-00007 13486035 bytes, 1 block(s): OK
/user/hadoop-twq/cmdpart-00008 13481498 bytes, 1 block(s): OK
...
检查并打印正在被打开执行写操作的文件(-openforwrite)
[]$ hdfs fsck /user/hadoop-twq/ -openforwrite
FSCK started by hadoop (auth:SIMPLE) from /172.16.212.17 for path /user/hadoop-twq/cmd at Thu Aug 13 09:41:28 CST 2015
....................................................................................................
....................................................................................................
.Status: HEALTHY
Total size: 2704782548 B
Total dirs: 1
Total files: 201
Total symlinks: 0
Total blocks (validated): 200 (avg. block size 13523912 B)
Minimally replicated blocks: 200 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 15
Number of racks: 1
FSCK ended at Thu Aug 13 09:41:28 CST 2015 in 10 milliseconds
The filesystem under path '/user/hadoop-twq/cmd' is HEALTHY
打印文件的Block报告(-blocks)
需要和-files一起使用
[]$ hdfs fsck /user/hadoop-twq -files -blocks
FSCK started by hadoop (auth:SIMPLE) from /172.16.212.17 for path /user/hadoop-twq/cmd at Thu Aug 13 09:45:59 CST 2015/user/hadoop-twq/cmd 7408754725 bytes, 56 block(s): OK
0.BP-1034052771-172.16.212.130-1405595752491:blk_1075892982_2152381 len=134217728repl=2
1.BP-1034052771-172.16.212.130-1405595752491:blk_1075892983_2152382 len=134217728repl=2
2.BP-1034052771-172.16.212.130-1405595752491:blk_1075892984_2152383 len=134217728repl=2
3.BP-1034052771-172.16.212.130-1405595752491:blk_1075892985_2152384 len=134217728repl=2
4.BP-1034052771-172.16.212.130-1405595752491:blk_1075892997_2152396 len=134217728repl=2
5.BP-1034052771-172.16.212.130-1405595752491:blk_1075892998_2152397 len=134217728repl=2
6.BP-1034052771-172.16.212.130-1405595752491:blk_1075892999_2152398 len=134217728repl=2
7.BP-1034052771-172.16.212.130-1405595752491:blk_1075893000_2152399 len=134217728repl=2
8.BP-1034052771-172.16.212.130-1405595752491:blk_1075893001_2152400 len=134217728repl=2
9.BP-1034052771-172.16.212.130-1405595752491:blk_1075893002_2152401 len=134217728repl=2
10.BP-1034052771-172.16.212.130-1405595752491:blk_1075893007_2152406 len=134217728repl=2
...
/user/hadoop-twq/cmd 7408754725 bytes, 56 block(s): 表示文件的总大小和block数;
0.BP-1034052771-172.16.212.130-1405595752491:blk_1075892982_2152381 len=134217728repl=2
1.BP-1034052771-172.16.212.130-1405595752491:blk_1075892983_2152382 len=134217728repl=2
-
前面的0. 1. 2.代表该文件的block索引,56的文件块,就从0-55;
-
BP-1034052771-172.16.212.130-1405595752491:blk_1075892982_2152381表示block id;
-
len=134217728 表示该文件块大小;
-
repl=2 表示该文件块副本数;
打印文件块的位置信息(-lications)
需要和-files -blocks一起使用。
[hadoop@dev ~]$ hdfs fsck /user/hadoop-twq/cmd -files -blocks -locations
FSCK started by hadoop (auth:SIMPLE) from /172.16.212.17 for path /user/hadoop-twq/cmd at Thu Aug 13 09:45:59 CST 2015
/user/hadoop-twq/cmd 7408754725 bytes, 56 block(s): OK
0. BP-1034052771-172.16.212.130-1405595752491:blk_1075892982_2152381 len=134217728 repl=2 [172.16.212.139:50010, 172.16.212.135:50010]
1. BP-1034052771-172.16.212.130-1405595752491:blk_1075892983_2152382 len=134217728 repl=2 [172.16.212.140:50010, 172.16.212.133:50010]
2. BP-1034052771-172.16.212.130-1405595752491:blk_1075892984_2152383 len=134217728 repl=2 [172.16.212.136:50010, 172.16.212.141:50010]
3. BP-1034052771-172.16.212.130-1405595752491:blk_1075892985_2152384 len=134217728 repl=2 [172.16.212.133:50010, 172.16.212.135:50010]
4. BP-1034052771-172.16.212.130-1405595752491:blk_1075892997_2152396 len=134217728 repl=2 [172.16.212.142:50010, 172.16.212.139:50010]
5. BP-1034052771-172.16.212.130-1405595752491:blk_1075892998_2152397 len=134217728 repl=2 [172.16.212.133:50010, 172.16.212.139:50010]
6. BP-1034052771-172.16.212.130-1405595752491:blk_1075892999_2152398 len=134217728 repl=2 [172.16.212.141:50010, 172.16.212.135:50010]
7. BP-1034052771-172.16.212.130-1405595752491:blk_1075893000_2152399 len=134217728 repl=2 [172.16.212.144:50010, 172.16.212.142:50010]
8. BP-1034052771-172.16.212.130-1405595752491:blk_1075893001_2152400 len=134217728 repl=2 [172.16.212.133:50010, 172.16.212.138:50010]
9. BP-1034052771-172.16.212.130-1405595752491:blk_1075893002_2152401 len=134217728 repl=2 [172.16.212.140:50010, 172.16.212.134:50010]
...
文件块的位置信息:[172.16.212.139:50010, 172.16.212.135:50010]
打印文件块位置所在的机架信息(-racks)
需要和 -files -blocks -locations一起使用
[]$hdfs fsck /user/hadoop-twq/cmd -files -blocks -locations -racks
2.FSCK started by hadoop (auth:SIMPLE) from /172.16.212.17 for path /user/hadoop-twq/cmd at Thu Aug 13 09:45:59 CST 2015
/user/hadoop-twq/cmd 7408754725 bytes, 56 block(s): OK
0. BP-1034052771-172.16.212.130-1405595752491:blk_1075892982_2152381 len=134217728 repl=2 [/default-rack/172.16.212.139:50010, /default-rack/172.16.212.135:50010]
1. BP-1034052771-172.16.212.130-1405595752491:blk_1075892983_2152382 len=134217728 repl=2 [/default-rack/172.16.212.140:50010, /default-rack/172.16.212.133:50010]
2. BP-1034052771-172.16.212.130-1405595752491:blk_1075892984_2152383 len=134217728 repl=2 [/default-rack/172.16.212.136:50010, /default-rack/172.16.212.141:50010]
3. BP-1034052771-172.16.212.130-1405595752491:blk_1075892985_2152384 len=134217728 repl=2 [/default-rack/172.16.212.133:50010, /default-rack/172.16.212.135:50010]
4. BP-1034052771-172.16.212.130-1405595752491:blk_1075892997_2152396 len=134217728 repl=2 [/default-rack/172.16.212.142:50010, /default-rack/172.16.212.139:50010]
5. BP-1034052771-172.16.212.130-1405595752491:blk_1075892998_2152397 len=134217728 repl=2 [/default-rack/172.16.212.133:50010, /default-rack/172.16.212.139:50010]
...
机架信息:[/default-rack/172.16.212.139:50010, /default-rack/172.16.212.135:50010]
三、安全模式
3.1、概述
3.3.1、NameNode启动
NameNode启动时,首先将镜像文件(Fsimage)载入内存,并执行编辑日志(Edits)中的各项操作,一旦在内存中成功建立文件系统元数据映像,则创建一个新的Fsimage文件和一个空的编辑日志。此时,NameNode开始监听DataNode请求。这个过程期间,Name Node一直在运行安全模式,即NameNode的文件系统对于客户端来说是只读的。
3.3.2、DataNode启动
系统中的数据块位置并不是由NameNode维护的,而是以块列表的形式存储在DataNode中。在系统的正常操作期间,NameNode会在内存中保留所有块位置的映射信息。在安全模式下,各个DataNode会向NameNode发送最新的块列表信息,NameNode了解到足够多的块位置信息之后,即可高效运行文件系统。
3.3.3、安全模式推出判
如果满足”最小副本条件“NameNode会在30秒钟之后推出安全模式。所谓的最小副本条件指的是在整个文件系统中99.9%的块满足最小副本级别(默认值:dfs.replication.min=1).在启动一个刚刚格式化的HFDS集群时,因为系统钟还没有任何块,所以NameNode不会进行安全模式
3.2、基本语法
集群启动处于安全模式时,不能执行重要操作(写操作)。集群启动完成后,自动退出安全模式。
- bin/hdfs dfsadmin -safemode get (功能描述:查看安全模式状态)
- bin/hdfs dfsadmin -safemode enter (功能描述:进入安全模式状态)
- bin/hdfs dfsadmin -safemode leave (功能描述:离开安全模式状态)
- bin/hdfs dfsadmin -safemode wait (功能描述:等待安全模式状态)
四、配额管理(Quota)
Hadoop分布式文件系统(HDFS)允许管理员为每个目录设置配额。 新建立的目录没有配额。 最大的配额是Long.Max_Value。配额为1可以强制目录保持为空。
目录配额是对目录树上该目录下的名字数量做硬性限制。如果创建文件或目录时超过了配额,该操作会失败。重命名不会改变该目录的配额;如果重命名操作会导致违反配额限制,该操作将会失败。如果尝试设置一个配额而现有文件数量已经超出了这个新配额,则设置失败。
配额和fsimage保持一致。当启动时,如果fsimage违反了某个配额限制(也许fsimage被偷偷改变了),则启动失败并生成错误报告。设置或删除一个配额会创建相应的日志记录。
在多人共用HDFS的环境下,配置设置非常重要。特别是在Hadoop处理大量资料的环境,如果没有配额管理,很容易把所有的空间用完造成别人无法存取。Hdfs的配额设定是针对目标而不是针对账号,所有在管理上最好让每个账号仅操作某一个目录,然后对目录设置配置。
默认情况下HDFS没有任何配置限制,可以使用hadoop fs -count来查看配置情况
hadoop fs -count -q /user/seamon
QUOTA REMAINING_QUOTA SPACE_QUOTA REMAINING_SPACE_QUOTA
none inf none inf
DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME
6 15
4.1、NameQuotas
设置某一个目录下的总文件数。是对当前目录树中的文件和目录名称的数量的硬限制。如果超出配额,文件和目录创建将会失败。配额与重新命名目录操作绑定;如果操作会导致配额违规,重命名操作将失败。即使目录违反新的配额,设置配额的尝试仍然会成功。新创建的目录没有关联的配额。最大的配额是Long.Max_Value。只等于1的配额会强制一个目录保持为空。 (一个目录也占用配额!)配额持久化在fsimage中,当集群启动时,如果fsimage中有违反配额的目录(可能是FsImage文件被偷偷地修改了),将会打印出一个warning。设置或者移除配额配置会创建日志条目。
hdfs dfsadmin -setQuota <N> 路径1 路径2
功能描述:开启指定目录存储文件或文件夹的个数配额上限,包括文件夹自己
hdfs dfsadmin –clrQuota 路径1 路径2
功能描述:清除指定目录存储文件或文件夹的个数配额上限
4.2、SpaceQuotas
设置某一个目录下的可使用空间大小。space quota是该目录树中的文件使用的字节数量的硬限制。如果配额不允许写入完整的块,则块分配失败。一个区块的每个副本都会计入配额。配额重新命名目录;如果操作将导致配额违规,重命名操作将失败。新创建的目录没有关联的配额。最大的配额是Long.Max_Value。零配额仍允许创建文件,但不能将任何块添加到文件中。目录不使用主机文件系统空间,不要计入space quota。用于保存文件元数据的主机文件系统空间不计入配额。配额按照该文件的预期复制因子收费;更改文件的复制因子将会记入或扣除配额。
配额持久化在fsimage中。当集群启动时,如果fsimage中有违反配额的目录(可能是FsImage文件被偷偷地修改了),将会打印出一个warning。设置或者移除配额配置会创建日志条目。
hdfs dfsadmin -setSpaceQuota 容量 路径1 路径2
功能描述:这是目录树下所有文件的总大小的一个硬限制。SpaceQuota也把副本算在内,即1GB的数据副本消耗3GB的配额。为了方便,也可以用二进制前缀指定容量。 30m为30MB,50g为50GB,2t为2TB等等。每个目录的最佳效果,如果N不是零,也不是长整型,那么目录不存在或者是一个文件,或者该目录将立即超出新的配额。
hdfs dfsadmin –clrSpaceQuota 容量 路径1 路径2
功能描述:清除指定目录的空间配额上限
参考资料:
https://www.cnblogs.com/tesla-turing/p/11487899.html
修改了),将会打印出一个warning。设置或者移除配额配置会创建日志条目。
hdfs dfsadmin -setSpaceQuota 容量 路径1 路径2
功能描述:这是目录树下所有文件的总大小的一个硬限制。SpaceQuota也把副本算在内,即1GB的数据副本消耗3GB的配额。为了方便,也可以用二进制前缀指定容量。 30m为30MB,50g为50GB,2t为2TB等等。每个目录的最佳效果,如果N不是零,也不是长整型,那么目录不存在或者是一个文件,或者该目录将立即超出新的配额。
hdfs dfsadmin –clrSpaceQuota 容量 路径1 路径2
功能描述:清除指定目录的空间配额上限
参考资料:
https://www.cnblogs.com/tesla-turing/p/11487899.html
https://www.baidu.com/link?url=xe1xYnCFK35Vn_Z1UPJx1mdnJWqI_keefm-iMBNjKijzopIYlMHP9-SgPkCdJpvCVtgir2osQ9SINL5xLxHwpq&wd=&eqid=c78f30470014873e000000035fd816c0