需要java环境的支持,java环境的配置这里就不在详细讲解安装过程了
[root@swarm01 ~]# java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)[root@swarm01 ~]# hadoop version
Hadoop 2.7.7
Subversion Unknown -r c1aad84bd27cd79c3d1a7dd58202a8c3ee1ed3ac
Compiled by stevel on 2018-07-18T22:47Z
Compiled with protoc 2.5.0
From source with checksum 792e15d20b12c74bd6f19a1fb886490
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.7.jar
[root@swarm01 ~]#
grep的demo演示
创建一个文件夹 grep_demo用于存放grep程序的input和output
[root@swarm01 swarm01]# mkdir grep_demo[root@swarm01 swarm01]# ll
total 213600
drwxr-xr-x. 2 root root 6 Apr 14 21:27 grep_demo
-rw-r--r--. 1 root root 218720521 Jul 20 2018 hadoop-2.7.7.tar.gz
drwxr-xr-x. 3 root root 60 Apr 14 11:37 java-8
-rw-r--r--. 1 root root 1506 Apr 12 11:00 vi.text
drwxr-xr-x. 4 root root 33 Apr 14 13:16 word_regex
[root@swarm01 swarm01]#
这里可以选择吧hadoop的文件名直接拷贝到这个input文件夹中,也可以自己创建,这里我选择自己创建了,这样更有含义
下面是准备工作,执行命令在gerp_demo文件夹中
[root@swarm01 swarm01]# cd grep_demo/[root@swarm01 grep_demo]# ll
total 0
[root@swarm01 grep_demo]# mkdir input[root@swarm01 grep_demo]# ll
total 0
drwxr-xr-x. 2 root root 6 Apr 14 21:29 input
[root@swarm01 grep_demo]# cd input/[root@swarm01 input]# touch sakura_demo.xml[root@swarm01 input]# touch licunzhi_demo.xml[root@swarm01 input]# ll
total 0
-rw-r--r--. 1 root root 0 Apr 14 21:30 licunzhi_demo.xml
-rw-r--r--. 1 root root 0 Apr 14 21:30 sakura_rain.xml
[root@swarm01 input]# [root@swarm01 input]# cat licunzhi_demo.xml
licunzhi_demo_001
sakura_demo_licunzhi
[root@swarm01 input]# cat sakura_demo.xml
sakura_demo_001
demo_001_sakura
[root@swarm01 input]#
执行命令,将会生成output文件夹,因此这里面不需要创建output,否则一定会报失败的
[root@swarm01 grep_demo]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep input output 'sakura_[a-z.]+'
19/04/14 09:35:45 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/04/14 09:35:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/04/14 09:35:47 INFO input.FileInputFormat: Total input paths to process : 4
19/04/14 09:35:47 INFO mapreduce.JobSubmitter: number of splits:4
19/04/14 09:35:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1775879846_0001
19/04/14 09:35:48 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/04/14 09:35:48 INFO mapreduce.Job: Running job: job_local1775879846_0001
19/04/14 09:35:48 INFO mapred.LocalJobRunner: OutputCommitter setin config null
19/04/14 09:35:48 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
。。。。。。。。。。。。。。具体的日志信息这里面就不全部展示了
最终的效果
[root@swarm01 grep_demo]# ll
total 0
drwxr-xr-x. 2 root root 102 Apr 14 21:30 input
drwxr-xr-x. 2 root root 88 Apr 14 21:35 output
[root@swarm01 grep_demo]# cd output/[root@swarm01 output]# ll
total 0
-rw-r--r--. 1 root root 0 Apr 14 21:35 part-r-00000
-rw-r--r--. 1 root root 0 Apr 14 21:35 _SUCCESS
[root@swarm01 output]# [root@swarm01 grep_demo]# cat output/part-r-00000
2 sakura_demo
[root@swarm01 grep_demo]#
——SUCCESS是执行成功的标志