1 准备工作
先启动Hadoop。
1.格式化HDFS
bin/hadoop namenode -format
2.启动Hadoop
bin/start-all.sh
3.验证是否完全启动
使用jps命令,若显示出六项进程的状态,则说明启动成功。
2 运行步骤
1.建立检测文件夹file
在适当位置创建一个文件夹file,并创建两个待检测的文件。
例如,我在$HADOOP_HOME目录下创建文件夹file,并在该目录下创建两个待测文件test1.txt、test2.txt,然后在文件中写入适当内容,保存退出。
mkdir file
cd file
vi test1.txt
vi test2.txt
2.建立hdfsinput
在$HADOOP_HOME目录下。使用命令:
bin/hadoop fs -mkdir hdfsinput
3.将file上传到hdfsinput
在$HADOOP_HOME目录下。使用命令:
bin/hadoop fs -put file/test*.txt hdfsinput
4.调用wordcount
在$HADOOP_HOME目录下。使用命令:
bin/hadoop jar hadoop-examples-1.2.1.jar wordcount hdfsinput hdfsoutput
在这里将输出目录设为hdfsoutput。
运行过程如下:
17/08/11 15:48:37 INFO input.FileInputFormat: Total input paths to process : 2
17/08/11 15:48:37 INFO util.NativeCodeLoader: Loaded the native-hadoop library
17/08/11 15:48:37 WARN snappy.LoadSnappy: Snappy native library not loaded
17/08/11 15:48:38 INFO mapred.JobClient: Running job: job_201708111545_0001
17/08/11 15:48:39 INFO mapred.JobClient: map 0% reduce 0%
17/08/11 15:48:43 INFO mapred.JobClient: map 100% reduce 0%
17/08/11 15:48:50 INFO mapred.JobClient: map 100% reduce 33%
17/08/11 15:48:51 INFO mapred.JobClient: map 100% reduce 100%
17/08/11 15:48:51 INFO mapred.JobClient: Job complete: job_201708111545_0001
17/08/11 15:48:51 INFO mapred.JobClient: Counters: 29
17/08/11 15:48:51 INFO mapred.JobClient: Map-Reduce Framework
17/08/11 15:48:51 INFO mapred.JobClient: Spilled Records=8
17/08/11 15:48:51 INFO mapred.JobClient: Map output materialized bytes=62
17/08/11 15:48:51 INFO mapred.JobClient: Reduce input records=4
17/08/11 15:48:51 INFO mapred.JobClient: Virtual memory (bytes) snapshot=5646143488
17/08/11 15:48:51 INFO mapred.JobClient: Map input records=2
17/08/11 15:48:51 INFO mapred.JobClient: SPLIT_RAW_BYTES=232
17/08/11 15:48:51 INFO mapred.JobClient: Map output bytes=42
17/08/11 15:48:51 INFO mapred.JobClient: Reduce shuffle bytes=62
17/08/11 15:48:51 INFO mapred.JobClient: Physical memory (bytes) snapshot=521117696
17/08/11 15:48:51 INFO mapred.JobClient: Reduce input groups=3
17/08/11 15:48:51 INFO mapred.JobClient: Combine output records=4
17/08/11 15:48:51 INFO mapred.JobClient: Reduce output records=3
17/08/11 15:48:51 INFO mapred.JobClient: Map output records=4
17/08/11 15:48:51 INFO mapred.JobClient: Combine input records=4
17/08/11 15:48:51 INFO mapred.JobClient: CPU time spent (ms)=1640
17/08/11 15:48:51 INFO mapred.JobClient: Total committed heap usage (bytes)=480247808
17/08/11 15:48:51 INFO mapred.JobClient: File Input Format Counters
17/08/11 15:48:51 INFO mapred.JobClient: Bytes Read=26
17/08/11 15:48:51 INFO mapred.JobClient: FileSystemCounters
17/08/11 15:48:51 INFO mapred.JobClient: HDFS_BYTES_READ=258
17/08/11 15:48:51 INFO mapred.JobClient: FILE_BYTES_WRITTEN=156810
17/08/11 15:48:51 INFO mapred.JobClient: FILE_BYTES_READ=56
17/08/11 15:48:51 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=26
17/08/11 15:48:51 INFO mapred.JobClient: Job Counters
17/08/11 15:48:51 INFO mapred.JobClient: Launched map tasks=2
17/08/11 15:48:51 INFO mapred.JobClient: Launched reduce tasks=1
17/08/11 15:48:51 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8131
17/08/11 15:48:51 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
17/08/11 15:48:51 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5149
17/08/11 15:48:51 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
17/08/11 15:48:51 INFO mapred.JobClient: Data-local map tasks=2
17/08/11 15:48:51 INFO mapred.JobClient: File Output Format Counters
17/08/11 15:48:51 INFO mapred.JobClient: Bytes Written=26
5.查看是否运行成功
若运行成功,使用命令
bin/hadoop fs -ls hdfsoutput
查看输出目录hdfsoutput,在输出目录会有以下几个文件:
Found 3 items
-rw-r--r-- 3 root supergroup 0 2017-08-11 15:48 /user/root/hdfsoutput/_SUCCESS
drwxr-xr-x - root supergroup 0 2017-08-11 15:48 /user/root/hdfsoutput/_logs
-rw-r--r-- 3 root supergroup 26 2017-08-11 15:48 /user/root/hdfsoutput/part-r-00000
其中,运行结果在hdfsoutput/part-r-00000文件中。
6.查看运行结果
查看hdfsoutput/part-r-00000文件即可。使用命令:
bin/hadoop fs -cat hdfsoutput/part-r-00000
运行结果显示如下:
Hadoop! 1
Hello 2
word! 1
3 异常处理
在使用命令
bin/hadoop jar hadoop-examples-1.2.1.jar wordcount hdfsinput hdfsoutput
倒入文件时,可以查看这句运行语句,若倒入的文件数量为0则说明倒入失败。原因有二:
- 倒入的文件路径有问题,导致没有该文件
- 倒入命令有错,wordcount名与jar的命名不一致