通过debug来分析MapReduce的工作过程

通过观察map和reduce的工作过程来看，以map和reduce为例子，其中最核心和最重要的就是下面这句源码：
	

	//Mapper源码的 run方法
	
  while (context.nextKeyValue()) {
        map(context.getCurrentKey(), context.getCurrentValue(), context);
      }	

	//Reducer源码的 run方法	
	
  while (context.nextKey()) {
        reduce(context.getCurrentKey(), context.getValues(), context);
        // If a back up store is used, reset it
        Iterator<VALUEIN> iter = context.getValues().iterator();
        if(iter instanceof ReduceContext.ValueIterator) {
          ((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();        
        }
      }

这表明： mapper类，每一行都会处理一次，并且这一行处理完毕后，它会判断是否还有下一行。这之中：

1. mapper只把一行中的数据进行拆分 ， 每个单位的value只可能是1
2. mapper之后，我们可能得到很多很多，key相同的 <key,value> 单元

而，reducer类中，他把每个相同的key合并在一起

3. reduce 的处理对象是：每个<key value>单元
4. 如果key相同，这些相同key的工作单元合并为一个大的工作单元

又决定放弃

发布了35 篇原创文章 · 获赞 5 · 访问量 2413

私信关注

通过debug来分析MapReduce的工作过程

猜你喜欢