一、配置window环境
1、配置环境变量
HADOOP_HOME=
PATH=%HADOOP_HOME%\bin;%HADOOP_HOME%\sbin;
配置hadoop中的etc\hadoop相关文件
2、hadoop-env.cmd
set JAVA_HOME=D:\Programming\Jdk\jdk1_8_91
3、core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/C:/hadoop-2.8.0/work/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> </configuration>
4、hdfs.site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>localhost:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/C:/hadoop-2.8.0/work/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/C:/hadoop-2.8.0/work/hdfs/data</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
5、mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>localhost:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>localhost:19888</value> </property> </configuration>
6、yarn.site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>localhost:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>localhost:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>localhost:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>localhost:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>localhost:8088</value> </property> </configuration>
7、slaves
localhost
8、namenode格式化
hdfs namenode -format
9、启动hadoop
start-all.cmd
10、测试hadoop启动成功
http://localhost:50070/
二、配置idea + maven +hadoop
1、在IDEA中创建maven工程,配置pom.xml
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <hadoop.version>2.7.1</hadoop.version> </properties> <dependencies> <!--test--> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.10</version> <scope>test</scope> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> <version>1.6.4</version> </dependency> <!--hadoop--> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> </dependencies>
2、创建类
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; /** * Created by Quiet on 2017/6/8. */ public class OilTotal { public static class OilMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //油卡信息 String info = value.toString().split("\\|")[0]; context.write(new Text(info), new IntWritable(1)); } } public static class OilReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); //设置hdfs的通讯地址 Job job = Job.getInstance(conf); job.setJobName("oilJob"); job.setJarByClass(OilTotal.class); job.setMapperClass(OilMapper.class); job.setReducerClass(OilReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); //设置传入文件和输出文件在hadoop服务器 String path = "hdfs://localhost:9000/bigdata/oil/"; FileInputFormat.addInputPath(job,new Path(path)); FileOutputFormat.setOutputPath(job,new Path(path+"result")); /* //设置传入文件和输出文件在本地 FileInputFormat.addInputPath(job,new Path(args[0])); FileOutputFormat.setOutputPath(job,new Path(args[1]));*/ System.exit(job.waitForCompletion(true) ? 0 : 1 ); } }
三、两种方式配置项目启动
1、配置hadoop服务器中文件的输入输出路径 ,传入文件须上传到hadoop服务器,输出结果也在服务器
String path = "hdfs://localhost:9000/bigdata/oil/"; FileInputFormat.addInputPath(job,new Path(path)); FileOutputFormat.setOutputPath(job,new Path(path+"result"));
2、配置本地文件的输入输出路径
文件上传和结果输出全自动放在本地
在项目中创建input文件夹,传入文件存入在此目录下,output目录程序启动时会自动创建,存放输出结果
程序中处理路径如下:
FileInputFormat.addInputPath(job,new Path(args[0])); FileOutputFormat.setOutputPath(job,new Path(args[1]));
四、启动项目
执行应用程序成功:
Map-Reduce Framework
Map input records=20
Map output records=20
Map output bytes=880
Map output materialized bytes=926
Input split bytes=116
Combine input records=0
Combine output records=0
Reduce input groups=10
Reduce shuffle bytes=926
Reduce input records=20
Reduce output records=10
Spilled Records=40
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
五、常见问题
1、运行Hadoop2的WordCount.java代码时出现了这样错误,
log4j:WARNPleaseinitializethelog4jsystemproperly. log4j:WARNSeehttp://logging.apache.org/log4j/1.2/faq.html#noconfigformoreinfo. Exceptioninthread"main"java.lang.NullPointerExceptionatjava.lang.ProcessBuilder.start(UnknownSource)atorg.apache.hadoop.util.Shell.runCommand(Shell.java:482)atorg.apache.hadoop.util.Shell.run(Shell.java:455)atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)atorg.apache.hadoop.util.Shell.execCommand(Shell.java:808)atorg.apache.hadoop.util.Shell.execCommand(Shell.java:791)at
分析:
下载Hadoop2以上版本时,在Hadoop2的bin目录下没有winutils.exe
将hadoopbin_for_hadoop2.7.1中的内容放入bin目录下,并将hadoop.dll放在C:\WINDOWS\SYSTEM32,将电脑重新启动,再交执行即可
2、输出路径中的目录文件已经存在
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/E:/projectcty/IDEA/bigdata/output already exists
删除hadoop中的输出目录路径中的文件,程序执行时会自动创建