一、实现功能
使用oozie实现简单的wordcount的mapreduce实例任务调用。
二、步骤
1.在oozie根目录下创建目录
mkdir oozie-apps
2.解压缩根目录下原有的实例包
tar -zxf oozie-examples.tar.gz
3.把需要的模板复制到oozie-apps,并改名字
cp -ra examples/apps/map-reduce/ oozie-apps/
cd oozie-apps/
mv map-reduce/ mr-wordcount
4.修改两个关键文件
(1)job.properties
作用:job.properties的属性设值,然后,给workflow.xml使用。
nameNode=hdfs://hadoop:8020
jobTracker=hadoop:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/hadoop/${examplesRoot}/mr-wordcount/workflow.xml
outputDir=map-reduce
(2)修改workflow.xml
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/hadoop/${examplesRoot}/mr-wordcount/output"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<!--New API-->
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
这些都可以去刚刚跑的wordcount任务的history里面configuration中搜索关键字得到
<!--mapper class-->
<property>
<name>mapreduce.job.map.class</name>
<value>com.bigdata.hadoop.WordCountMR$WordCountMapper</value>
</property>
<property>
<name>mapreduce.map.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.map.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<!--reducer class-->
<property>
<name>mapreduce.job.reduce.class</name>
<value>com.bigdata.hadoop.WordCountMR$WordCountReducer</value>
</property>
<property>
<name>mapreduce.job.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.job.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<!--INPUT-->
<property>
<name>mapred.input.dir</name>
<value>${nameNode}/user/hadoop/${examplesRoot}/mr-wordcount/input</value>
</property>
<!--OUTPUT-->
<property>
<name>mapred.output.dir</name>
<value>${nameNode}/user/hadoop/${examplesRoot}/mr-wordcount/output</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
5.上传新编的wordcount测试包
/opt/modules/oozie-4.1.0-cdh5.7.0/oozie-apps/mr-wordcount/lib下,删除原有oozie-examples-4.1.0-cdh5.7.0.jar包
rm lib/oozie-examples-4.0.0-cdh5.3.6.jar
加载自己测试包
cp /opt/datas/wc.jar lib/
备注:测试包位置可以从https://download.csdn.net/download/u010886217/10831358下载
6.创建测试数据目录
/opt/modules/oozie-4.1.0-cdh5.7.0/oozie-apps/mr-wordcount下
mkdir input
拷贝测试数据
cp /opt/datas/wc.data /opt/modules/oozie-4.1.0-cdh5.7.0/oozie-apps/mr-wordcount/input/
7.上传至hdfs
这个是必须的,oozie最终运行地方是hdfs上的文件,如果配置有更改,也需要先删除hdfs上文件,然后重新上传完成
bin/hdfs dfs -mkdir -p /user/hadoop
bin/hdfs dfs -put /opt/modules/oozie-4.1.0-cdh5.7.0/oozie-apps/ /user/hadoop
三、执行测试
1.准备
(1)hdfs
(2)yarn
(3)historyserver(这也是必须启动的,用来反馈信息给oozie)
2.启动oozie
bin/oozied.sh start
3.跑命令
bin/oozie job -oozie http://hadoop:11000/oozie -config oozie-apps/mr-wordcount/job.properties -run
4.检查结果
http://hadoop:11000/oozie/
在yarn查看,多了两个任务,先进行oozie,然后进行jar包的mapreduce
http://hadoop:9090/cluster
查看输出结果目录里的内容
输出目录/user/hadoop/oozie-apps/mr-wordcount/output有内容