刚接触hadoop不久,最近需要用到MapWritable,但直接使用后,在part-r-00000中的输出结果类似为(以下结果摘自https://blog.csdn.net/jiyuanyi1992/article/details/37739413 ,因自己的代码已修改+覆盖):
key1 org.apache.hadoop.io.MapWritable@396cbd97
key2 org.apache.hadoop.io.MapWritable@17991de1
key3 org.apache.hadoop.io.MapWritable@18f63055
究其原因还是MapWritable默认的toString函数无法识别我们自己代码中的Map对,因此只需新建一个java类,继承MapWritable,并重载其toString()函数即可,以我自己的代码作为示例,我的MapWritable中存储的分别是Text和IntWritable两种数据类型,因此新建的MapWritable编码如下:
package mySimijoin;
import java.util.Set;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Writable;
public class myMapWritable extends MapWritable{
@Override
public String toString(){
String s = new String("{ ");
//String s = "";
Set<Writable> keys = this.keySet();
for (Writable key : keys) {
IntWritable count = (IntWritable) this.get(key);
s = s + key.toString() + " " + count.toString() ;
}
s = s + " }";
return s;
}
}
中间那部分是用于遍历MapWritable的,我们知道已知key值,可以通过MapWritable.get(key)来获取value值,那么若两眼一抹黑,在什么都不知道的情况下,就可以通过先用keySet()函数得到一个key的集合,然后遍历这个集合即可获取到每个key对应的value,这样也就实现了MapWritable的遍历。
上述就能解决MapWritable的输出问题,那么现在既然提到了MapWritable,就顺便将其存取问题再细化一些。
MapWritable中不止可以存储单独的Writable类型数据,还能存放MapWritable类型,似乎只要是继承自Writable的都可以,如下所示:
MapWritable mapWritable = new MapWritable();
MapWritable mapWritable1 = new MapWritable();
MapWritable mapWritable2 = new MapWritable();
Text text1 = new Text("hello");
IntWritable intWritable1 = new IntWritable(1);
Text text2 = new Text("hi");
IntWritable intWritable2 = new IntWritable(11);
mapWritable1.put(text1, intWritable1);
mapWritable2.put(text2, intWritable2);
mapWritable.put(mapWritable1, mapWritable2);
遍历方法类似,可以通过两次keySet函数实现,如我的reduce函数如下:
public static class myReducer extends Reducer<Text, myMapWritable, Text, myMapWritable> {
public void reduce(Text key, Iterable<myMapWritable> values, Context context)
throws IOException, InterruptedException {
myMapWritable tmp = new myMapWritable();
for (myMapWritable val : values) {
for (Writable valkey : val.keySet()) {
IntWritable intWritable = (IntWritable) val.get(valkey);
tmp.put(new Text(valkey.toString()), intWritable);
}
}
context.write(key, tmp);
}
}
当然,toString可能也需要根据自己的需求进行重载。