hadoop在win7+idea+maven中配置

一、配置window环境

1、配置环境变量

HADOOP_HOME=

PATH=%HADOOP_HOME%\bin;%HADOOP_HOME%\sbin;

配置hadoop中的etc\hadoop相关文件

2、hadoop-env.cmd

set JAVA_HOME=D:\Programming\Jdk\jdk1_8_91

3、core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/C:/hadoop-2.8.0/work/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>

4、hdfs.site.xml

<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/C:/hadoop-2.8.0/work/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/C:/hadoop-2.8.0/work/hdfs/data</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

5、mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
</configuration>

6、yarn.site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>
</configuration>

7、slaves

localhost

8、namenode格式化

hdfs namenode -format

9、启动hadoop

start-all.cmd

10、测试hadoop启动成功

http://localhost:50070/

二、配置idea + maven +hadoop

1、在IDEA中创建maven工程，配置pom.xml

<properties>
  <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  <hadoop.version>2.7.1</hadoop.version>
</properties>

<dependencies>

  <!--test-->
  <dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.10</version>
    <scope>test</scope>
  </dependency>
  <dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-log4j12</artifactId>
    <version>1.6.4</version>
  </dependency>

  <!--hadoop-->
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>${hadoop.version}</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>${hadoop.version}</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>${hadoop.version}</version>
  </dependency>

</dependencies>

2、创建类

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * Created by Quiet on 2017/6/8.
 */
public class OilTotal {
    public static class OilMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            //油卡信息
            String info = value.toString().split("\\|")[0];
            context.write(new Text(info), new IntWritable(1));
        }
    }

    public static class OilReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable value : values) {
                sum += value.get();
            }
            context.write(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        //设置hdfs的通讯地址

        Job job = Job.getInstance(conf);
        job.setJobName("oilJob");
        job.setJarByClass(OilTotal.class);
        job.setMapperClass(OilMapper.class);
        job.setReducerClass(OilReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //设置传入文件和输出文件在hadoop服务器
        String path = "hdfs://localhost:9000/bigdata/oil/";
        FileInputFormat.addInputPath(job,new Path(path));
        FileOutputFormat.setOutputPath(job,new Path(path+"result"));

      /*
        //设置传入文件和输出文件在本地
        FileInputFormat.addInputPath(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));*/

       System.exit(job.waitForCompletion(true) ? 0 : 1 );



   }
}

三、两种方式配置项目启动

1、配置hadoop服务器中文件的输入输出路径，传入文件须上传到hadoop服务器，输出结果也在服务器

String path = "hdfs://localhost:9000/bigdata/oil/";
FileInputFormat.addInputPath(job,new Path(path));
FileOutputFormat.setOutputPath(job,new Path(path+"result"));

2、配置本地文件的输入输出路径

文件上传和结果输出全自动放在本地

在项目中创建input文件夹，传入文件存入在此目录下，output目录程序启动时会自动创建，存放输出结果

程序中处理路径如下：

FileInputFormat.addInputPath(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));

四、启动项目

执行应用程序成功：

Map-Reduce Framework

Map input records=20

Map output records=20

Map output bytes=880

Map output materialized bytes=926

Input split bytes=116

Combine input records=0

Combine output records=0

Reduce input groups=10

Reduce shuffle bytes=926

Reduce input records=20

Reduce output records=10

Spilled Records=40

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=0

五、常见问题

1、运行Hadoop2的WordCount.java代码时出现了这样错误，

log4j:WARNPleaseinitializethelog4jsystemproperly. log4j:WARNSeehttp://logging.apache.org/log4j/1.2/faq.html#noconfigformoreinfo. Exceptioninthread"main"java.lang.NullPointerExceptionatjava.lang.ProcessBuilder.start(UnknownSource)atorg.apache.hadoop.util.Shell.runCommand(Shell.java:482)atorg.apache.hadoop.util.Shell.run(Shell.java:455)atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)atorg.apache.hadoop.util.Shell.execCommand(Shell.java:808)atorg.apache.hadoop.util.Shell.execCommand(Shell.java:791)at

分析：

下载Hadoop2以上版本时，在Hadoop2的bin目录下没有winutils.exe

将hadoopbin_for_hadoop2.7.1中的内容放入bin目录下，并将hadoop.dll放在C:\WINDOWS\SYSTEM32，将电脑重新启动，再交执行即可

2、输出路径中的目录文件已经存在

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/E:/projectcty/IDEA/bigdata/output already exists

删除hadoop中的输出目录路径中的文件，程序执行时会自动创建

hadoop在win7+idea+maven中配置

猜你喜欢