Hadoop操作HDFS的相关命令(python)
本文是基于CentOS 7系统环境,搭建Hadoop集群环境,并在主节点上进行测试
- CentOS 7
- python 3.6.8
- hadoop-2.7.1
一、Hadoop相关命令
(1) 查看HDFS的文件结构
hadoop fs -lsr /
(2) 新建文件夹
hadoop fs -mkdir /test_xz/input
(3) 上传本地文件到HDFS
hadoop fs -put /home/bailang/test.txt /test_xz/input
(4) 下载HDFS中的文件至本地目录
hadoop fs -get /test_xz/input/test.txt /home/bailang/
(5) 列出HDFS的某目录
hadoop fs -ls /test_xz
(6) 查看HDFS上的文件
hadoop fs -cat /test_xz/input/test.txt
(7) 删除HDFS上的文件
hadoop fs -rm /test_xz/input/test.txt
(8) 删除HDFS上的目录
hadoop fs -rmr /test_xz/input/
(9) 查看HDFS状态
hadoop dfsadmin -report
(10) 进入安全模式
hadoop dfsadmin -safemode enter
(11) 离开安全模式
hadoop dfsadmin -safemode leave
二、python操作HDFS
(1) 安装相关的hdfs包
pip install hdfs
(2) 读取hdfs文件内容
from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
file_path = "/test_xz/input/test.txt"
lines = []
with client.read(file_path, encoding='utf-8', delimiter='\n') as reader:
for line in reader:
lines.append(line.strip())
print(lines)
(3) 在hdfs上创建目录
from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input"
client.makedirs(hdfs_path)
(4) 返回hdfs指定目录下的文件
from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input"
print(client.list(hdfs_path, status=False))
(5) 移动或者修改文件
from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input"
client.rename(source_path, dst_path)
(6) 上传文件到hdfs
from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input"
local_path = "/home/bailang/test.txt"
client.upload(hdfs_path, local_path, cleanup=True)
(7) 下载hdfs上的文件到本地
from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input/test.txt"
local_path = "/home/bailang"
client.download(hdfs_path, local_path, overwrite=False)
(8) 以追加模式,将数据写入hdfs文件
from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input/test.txt"
client.write(hdfs_path, data, overwrite=False, append=True)
(9) 以覆盖模式,将数据写入hdfs文件
from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input/test.txt"
client.write(hdfs_path, data, overwrite=True, append=False)
(10) 删除hdfs中的文件
from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input/test.txt"
client.delete(hdfs_path)