——我恰好爱上了你这关无性别

一·需求描述：

要求从给出的数据中寻找所关心的数据，它是对原始数据所包含信息的挖掘。下面进入这个实例。

实例中给出child-parent（孩子——父母）表，要求输出grandchild-grandparent（孙子——爷奶）表。

=================样本输入：===================

child   parent
Tom   Lucy
Tom   Jack
Jone   Lucy
Jone   Jack
Lucy   Mary
Lucy   Ben
Jack   Alice
Jack   Jesse
Terry   Alice
Terry   Jesse
Philip   Terry
Philip   Alma
Mark   Terry
Mark   Alma

家族树状关系谱：

=================样本输出：===================

grandchild   grandparent
Tom    Alice
Tom    Jesse
Jone    Alice
Jone   Jesse
Tom    Mary
Tom    Ben
Jone     Mary
Jone    Ben
Philip   Alice
Philip    Jesse
Mark    Alice
Mark   Jesse

二·设计思路：

取一对样本为例：

child parent

Tom Lucy

Lucy Mary

mapper代码片段：

context.write(new Text(values[0]), new Text(values[1]+"_1"));//key是value的小孩 key:Tom value:Lucy_1

context.write(new Text(values[1]), new Text(values[0]+"_2"));//key是value的父母 key:Lucy value:Tom_2

即mapper读取文件的每一行都输出正反，并进行标记

三·程序代码：

mapper.java

package com.company.family;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class FamilyMapper extends Mapper<LongWritable, Text, Text, Text>{
	
	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
			throws IOException, InterruptedException {
		//value:"Tom	Lucy"
		String line = value.toString();
		String[] values = line.split("   ");
		context.write(new Text(values[0]), new Text(values[1]+"_1"));//key是value的小孩	key:Tom	value:Lucy_1
		context.write(new Text(values[1]), new Text(values[0]+"_2"));//key是value的父母	key:Lucy value:Tom_2
	}

}

reducer.java

package com.company.family;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class FamilyReducer extends Reducer<Text, Text, Text, Text>{

	@Override
	protected void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context)
			throws IOException, InterruptedException {
		//key:Lucy values:{Tom_2,Jone_2,Marry_1,Ben_1}
		List<String> yeyelist = new ArrayList<String>();
		List<String> children = new ArrayList<String>();
		for(Text val:values){
			if(val.toString().endsWith("_1")){
				yeyelist.add(val.toString());
			}else if(val.toString().endsWith("_2")){
				children.add(val.toString());
			}
		}
		//Tom	Marry
		//Tom	Ben
		//Jone	Marry
		//Jone	Ben
		for(String child:children){
			for(String yeye:yeyelist){
				context.write(new Text(child.substring(0, child.length()-2)), new Text(yeye.substring(0, yeye.length()-2)));
			}
		}
	}
}

runner.java

package com.company.family;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;



public class FamilyRunner {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		
		Configuration conf = new Configuration();		
		Job job = Job.getInstance(conf);		
		//对任务job的描述
		//job的jar路径
		job.setJarByClass(FamilyRunner.class);		
		//job对应的Mapper
		job.setMapperClass(FamilyMapper.class);
		//job的Reducer
		job.setReducerClass(FamilyReducer.class);
		//Mapper的输出类型
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Text.class);
		//Reducer的输出类型
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);		
		//job 处理文件路径
		FileInputFormat.setInputPaths(job, new Path("/Users/xuran/Desktop/week"));
		//job 处理之后文件路径
		FileOutputFormat.setOutputPath(job, new Path("/Users/xuran/Desktop/week/result"));
		//提交job
		boolean waitForCompletion = job.waitForCompletion(true);
		
		System.exit(waitForCompletion?0:1);
		
	}
}

最后再贡献出一张图：

实例中给出child-parent（孩子——父母）表，要求输出grandchild-grandparent（孙子——爷奶）表

——我恰好爱上了你这关无性别

一·需求描述：

二·设计思路：

三·程序代码：

猜你喜欢

实例中给出child-parent（孩子——父母）表，要求输出grandchild-grandparent（孙子——爷奶）表

——我恰好爱上了你 这关无性别

一·需求描述：

二·设计思路：

三·程序代码：

猜你喜欢

——我恰好爱上了你这关无性别