Hadoop系列--Hadoop自带程序wordcount运行示例

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_33429968/article/details/77102936

1 准备工作

  先启动Hadoop。
1.格式化HDFS
  

bin/hadoop namenode -format

2.启动Hadoop

bin/start-all.sh

3.验证是否完全启动
  使用jps命令,若显示出六项进程的状态,则说明启动成功。

2 运行步骤

1.建立检测文件夹file
  在适当位置创建一个文件夹file,并创建两个待检测的文件。
  例如,我在$HADOOP_HOME目录下创建文件夹file,并在该目录下创建两个待测文件test1.txt、test2.txt,然后在文件中写入适当内容,保存退出。

mkdir file
cd file
vi test1.txt
vi test2.txt

2.建立hdfsinput
  在$HADOOP_HOME目录下。使用命令:
  

bin/hadoop fs -mkdir hdfsinput

3.将file上传到hdfsinput
  在$HADOOP_HOME目录下。使用命令:
  

bin/hadoop fs -put file/test*.txt hdfsinput

4.调用wordcount
  在$HADOOP_HOME目录下。使用命令:

bin/hadoop jar hadoop-examples-1.2.1.jar wordcount hdfsinput hdfsoutput
  

  在这里将输出目录设为hdfsoutput。
  运行过程如下:
  

17/08/11 15:48:37 INFO input.FileInputFormat: Total input paths to process : 2
17/08/11 15:48:37 INFO util.NativeCodeLoader: Loaded the native-hadoop library
17/08/11 15:48:37 WARN snappy.LoadSnappy: Snappy native library not loaded
17/08/11 15:48:38 INFO mapred.JobClient: Running job: job_201708111545_0001
17/08/11 15:48:39 INFO mapred.JobClient:  map 0% reduce 0%
17/08/11 15:48:43 INFO mapred.JobClient:  map 100% reduce 0%
17/08/11 15:48:50 INFO mapred.JobClient:  map 100% reduce 33%
17/08/11 15:48:51 INFO mapred.JobClient:  map 100% reduce 100%
17/08/11 15:48:51 INFO mapred.JobClient: Job complete: job_201708111545_0001
17/08/11 15:48:51 INFO mapred.JobClient: Counters: 29
17/08/11 15:48:51 INFO mapred.JobClient:   Map-Reduce Framework
17/08/11 15:48:51 INFO mapred.JobClient:     Spilled Records=8
17/08/11 15:48:51 INFO mapred.JobClient:     Map output materialized bytes=62
17/08/11 15:48:51 INFO mapred.JobClient:     Reduce input records=4
17/08/11 15:48:51 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=5646143488
17/08/11 15:48:51 INFO mapred.JobClient:     Map input records=2
17/08/11 15:48:51 INFO mapred.JobClient:     SPLIT_RAW_BYTES=232
17/08/11 15:48:51 INFO mapred.JobClient:     Map output bytes=42
17/08/11 15:48:51 INFO mapred.JobClient:     Reduce shuffle bytes=62
17/08/11 15:48:51 INFO mapred.JobClient:     Physical memory (bytes) snapshot=521117696
17/08/11 15:48:51 INFO mapred.JobClient:     Reduce input groups=3
17/08/11 15:48:51 INFO mapred.JobClient:     Combine output records=4
17/08/11 15:48:51 INFO mapred.JobClient:     Reduce output records=3
17/08/11 15:48:51 INFO mapred.JobClient:     Map output records=4
17/08/11 15:48:51 INFO mapred.JobClient:     Combine input records=4
17/08/11 15:48:51 INFO mapred.JobClient:     CPU time spent (ms)=1640
17/08/11 15:48:51 INFO mapred.JobClient:     Total committed heap usage (bytes)=480247808
17/08/11 15:48:51 INFO mapred.JobClient:   File Input Format Counters 
17/08/11 15:48:51 INFO mapred.JobClient:     Bytes Read=26
17/08/11 15:48:51 INFO mapred.JobClient:   FileSystemCounters
17/08/11 15:48:51 INFO mapred.JobClient:     HDFS_BYTES_READ=258
17/08/11 15:48:51 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=156810
17/08/11 15:48:51 INFO mapred.JobClient:     FILE_BYTES_READ=56
17/08/11 15:48:51 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=26
17/08/11 15:48:51 INFO mapred.JobClient:   Job Counters 
17/08/11 15:48:51 INFO mapred.JobClient:     Launched map tasks=2
17/08/11 15:48:51 INFO mapred.JobClient:     Launched reduce tasks=1
17/08/11 15:48:51 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8131
17/08/11 15:48:51 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
17/08/11 15:48:51 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5149
17/08/11 15:48:51 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
17/08/11 15:48:51 INFO mapred.JobClient:     Data-local map tasks=2
17/08/11 15:48:51 INFO mapred.JobClient:   File Output Format Counters 
17/08/11 15:48:51 INFO mapred.JobClient:     Bytes Written=26

5.查看是否运行成功
  若运行成功,使用命令
  

bin/hadoop fs -ls hdfsoutput

  查看输出目录hdfsoutput,在输出目录会有以下几个文件:
  

Found 3 items
-rw-r--r--   3 root supergroup          0 2017-08-11 15:48 /user/root/hdfsoutput/_SUCCESS
drwxr-xr-x   - root supergroup          0 2017-08-11 15:48 /user/root/hdfsoutput/_logs
-rw-r--r--   3 root supergroup         26 2017-08-11 15:48 /user/root/hdfsoutput/part-r-00000

  其中,运行结果在hdfsoutput/part-r-00000文件中。
6.查看运行结果
  查看hdfsoutput/part-r-00000文件即可。使用命令:

bin/hadoop fs -cat hdfsoutput/part-r-00000

  运行结果显示如下:
  

Hadoop! 1
Hello   2
word!   1

3 异常处理

  在使用命令
  

bin/hadoop jar hadoop-examples-1.2.1.jar wordcount hdfsinput hdfsoutput

倒入文件时,可以查看这句运行语句,若倒入的文件数量为0则说明倒入失败。原因有二:

  • 倒入的文件路径有问题,导致没有该文件
  • 倒入命令有错,wordcount名与jar的命名不一致

猜你喜欢

转载自blog.csdn.net/qq_33429968/article/details/77102936