大数据入门（4）hdfs的shell语法

1、测试hdfs文件上传和下载（HDFS shell）
   1.0查看帮助
       hadoop fs -help <cmd>
   1.1上传
       hadoop fs -put <linux上文件> <hdfs上的路径>

       hadoop fs -put jdk-7u71-linux-x64.tar.gz hdfs://192.168.21.115:9000/

   1.2查看文件内容
       hadoop fs -cat <hdfs上的路径>
   1.3查看文件列表
       hadoop fs -ls /
   1.4下载文件
       hadoop fs -get <hdfs上的路径> <linux上文件>

       hadoop fs -get hdfs://192.168.1.115:9000/jdk-7u71-linux-x64.tar.gz
   1.5新建文件(一层一层的建)
       hadoop fs -mkdir /aa
       hadoop fs -mkdir /aa/bb

   1.6删除文件夹
       hadoop fs -rm -r /aa/bb

   1.7 统计文件大小
       hadoop fs -du    -s -h hdfs://192.168.1.115:9000/

1.8、copy 文件到指定虚拟机目录
scp 文件 192.168.1.116:/home/admin

2、测试mapreduce
使用shell自带的程序测试（app/hadoop-2.4.1/share/hadoop/mapreduce）

hadoop jar hadoop-mapreduce-examples-2.4.1.jar pi 5 5

   新建txt文件： vi test.txt，写入hello word

   新建input文件夹，将文件放到input文件夹中
   hadoop fs -mkdir /wordcount
   hadoop fs -mkdir /wordcount/input
   hadoop fs -put test.txt /wordcount/input

   执行计算，test.txt 中的文字个数，输入结果放到output文件夹中
   hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount /wordcount/input /wordcount/output

   查看文件列表：hadoop fs -ls /wordcount/output

   查看输入文件：hadoop fs -cat /wordcount/output/part-r-00000

2.使用java接口操作HDFS
见eclipse工程下的demo

本机修改ip映射：C:\Windows\System32\drivers\etc下的host文件，配置映射

3.hadoop通信机制
不同进程之间的方法进行调用

4.HDFS源码分析
   FileSystem.get --> 通过反射实例化了一个DistributedFileSystem --> new DFSCilent()把他作为自己的成员变量
   在DFSClient构造方法里面，调用了createNamenode，使用了RPC机制，得到了一个NameNode的代理对象，就可以和NameNode进行通信了

   FileSystem --> DistributedFileSystem --> DFSClient --> NameNode的代理

大数据入门（4）hdfs的shell语法

猜你喜欢