Hadoop 简单示例

1.云计算的概念

狭义云计算是指IT基础设施的交付和使用模式，通过网络以按需、易扩展的方式获得所需的资源（硬件、平台、软件）。

广义云计算是指服务的交付和使用模式，通过网络以按需、易扩展的方式获得所需的服务。这种服务可以是IT和软件、互联网相关的，也可以是任意其他的服务。

2.三层模型

Saas：more

Paas：hadoop

Iaas： openstack

3.google VS hadoop

google concept	hadoop concept
MapReduce	Hadoop
GFS	HDFS
Bigtable	HBase
Chubby	Zookeeper

扫描二维码关注公众号，回复： 467855 查看本文章

4.hadoop 编写map和reduce函数

4.1 map函数

public static class TokenizerMapper  extends Mapper<Object, Text, Text, IntWritable>{

   private final static IntWritable one = new IntWritable(1);
   private Text word = new Text();
      
   public void map(Object key, Text value, Context context) 
                            throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());            
        context.write(word, one);           //设置 key  value
      }
    }
}

说明： map的输出key 、value和reduce的输入key、value要一致

4.2 reduce

public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();
 
    public void reduce(Text key, Iterable<IntWritable> values, Context context)                                       throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();                                    //聚集操作
      }
      result.set(sum);
      context.write(key, result);
    }
  }

说明： map的输出key 、value和reduce的输入key、value要一致，见上面红色部分

4.3 job的配置

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: wordcount <in> <out>"); 
      System.exit(2);
    }
    Job job = new Job(conf, "word count");     //job name 
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    //file input 
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));  //file output
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}

5.命令行运行

步骤：

a.打包mapreduce函数，wordcount.jar 设类名WordCount

b.进入hadoop安装目录

c.执行方式：hadoop jar 本地jar包目录类名 hdfs输入文件目录 hdfs输入文件目录

例如：hadoop jar /home/deke/wordcount.jar WordCount hdfs输入文件目录 hdfs输出文件目录

6.eclipse配置

步骤：

a.下载eclipse

b.将 hadoop 文件夹下的 contrib/eclipse-plugin/hadoop-*-eclipse- plugin.jar ,

拷贝到 eclipse 文件夹下的/plugins 文件夹里

c.启动 Eclipse

d.设置 Hadoop 安装文件夹的路径

Window->Preferences—>hadoop Map/Reduce设置 hadoop的linux下文件位置，如：/usr/hadoop

e.window->show view->other->MapReduce Tool ->Map/Reduce Location,在Map/Reduce Location控制台空白处，右击选择“New Map/Reduce Location”,在弹出的对话框里，根据core-site.xml和maperd-site.xml里的端口填写

转自：hadoop基础学习（一）

猜你喜欢