——我恰好爱上了你 这关无性别
一·需求描述:
要求从给出的数据中寻找所关心的数据,它是对原始数据所包含信息的挖掘。下面进入这个实例。
实例中给出child-parent(孩子——父母)表,要求输出grandchild-grandparent(孙子——爷奶)表。
=================样本输入:===================
child parent
Tom Lucy
Tom Jack
Jone Lucy
Jone Jack
Lucy Mary
Lucy Ben
Jack Alice
Jack Jesse
Terry Alice
Terry Jesse
Philip Terry
Philip Alma
Mark Terry
Mark Alma
家族树状关系谱:
=================样本输出:===================
grandchild grandparent
Tom Alice
Tom Jesse
Jone Alice
Jone Jesse
Tom Mary
Tom Ben
Jone Mary
Jone Ben
Philip Alice
Philip Jesse
Mark Alice
Mark Jesse
二·设计思路:
取一对样本为例:
child parent
Tom Lucy
Lucy Mary
mapper代码片段:
context.write(new Text(values[0]), new Text(values[1]+"_1"));//key是value的小孩 key:Tom value:Lucy_1
context.write(new Text(values[1]), new Text(values[0]+"_2"));//key是value的父母 key:Lucy value:Tom_2
即mapper读取文件的每一行都输出正反,并进行标记
三·程序代码:
mapper.java
package com.company.family; import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class FamilyMapper extends Mapper<LongWritable, Text, Text, Text>{ @Override protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context) throws IOException, InterruptedException { //value:"Tom Lucy" String line = value.toString(); String[] values = line.split(" "); context.write(new Text(values[0]), new Text(values[1]+"_1"));//key是value的小孩 key:Tom value:Lucy_1 context.write(new Text(values[1]), new Text(values[0]+"_2"));//key是value的父母 key:Lucy value:Tom_2 } }reducer.java
package com.company.family; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class FamilyReducer extends Reducer<Text, Text, Text, Text>{ @Override protected void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException { //key:Lucy values:{Tom_2,Jone_2,Marry_1,Ben_1} List<String> yeyelist = new ArrayList<String>(); List<String> children = new ArrayList<String>(); for(Text val:values){ if(val.toString().endsWith("_1")){ yeyelist.add(val.toString()); }else if(val.toString().endsWith("_2")){ children.add(val.toString()); } } //Tom Marry //Tom Ben //Jone Marry //Jone Ben for(String child:children){ for(String yeye:yeyelist){ context.write(new Text(child.substring(0, child.length()-2)), new Text(yeye.substring(0, yeye.length()-2))); } } } }runner.java
package com.company.family; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class FamilyRunner { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); Job job = Job.getInstance(conf); //对任务job的描述 //job的jar路径 job.setJarByClass(FamilyRunner.class); //job对应的Mapper job.setMapperClass(FamilyMapper.class); //job的Reducer job.setReducerClass(FamilyReducer.class); //Mapper的输出类型 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); //Reducer的输出类型 job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); //job 处理文件路径 FileInputFormat.setInputPaths(job, new Path("/Users/xuran/Desktop/week")); //job 处理之后文件路径 FileOutputFormat.setOutputPath(job, new Path("/Users/xuran/Desktop/week/result")); //提交job boolean waitForCompletion = job.waitForCompletion(true); System.exit(waitForCompletion?0:1); } }
最后再贡献出一张图: