Java: JDK1.7.0_71 Hadoop: hadoop-2.5.2 Linux: centos6.4 64bit
暂且配置3台机器,假设三台机器IP如下:
192.168.40.138 master 192.168.40.137 slave-1 192.168.40.136 slave-2
一.前置环境配置
1.创建hadoop用户
以下操作使用root用户
$useradd -d /home/hadoop -s /bin/bash hadoop $passwd hadoop hadoop
2.关闭防火墙 (每台机器都需)
$chkconfig iptables off 关闭selinux $vi /etc/selinux/config SELINUX=disabled
3.修改机器名
$vi /etc/sysconfig/network 将三台机器的名字分别改为master,slave-1,slave-2
4.配置hosts文件
$vi /etc/hosts 追加 192.168.40.138 master 192.168.40.137 slave-1 192.168.40.136 slave-2
5.重启机器
$reboot
3.配置SSH互信
1.修改SSH配置文件
$vi /etc/ssh/sshd_config 找到下列行 去掉注释# RSAAuthentication yes //字面意思..允许RSA认证 PubkeyAuthentication yes //允许公钥认证 AuthorizedKeysFile .ssh/authorized_keys //公钥存放在.ssh/au..文件中
2.重启SSH
$/etc/init.d/sshd restart
3.切换至hadoop用户在master上进行如下操作
$ssh-keygen -t rsa
在slave-1上操作
$ssh-keygen -t rsa $scp /home/hadoop/.ssh/id_rsa.pub hadoop@master:~/.ssh/id_rsa.pub.slave-1
在slave-2上操作
$ssh-keygen -t rsa $scp /home/hadoop/.ssh/id_rsa.pub hadoop@master:~/.ssh/id_rsa.pub.slave-2
在master上操作
$cat id_rsa.pub >> authorized_keys $cat id_rsa.pub.slave-1 >> authorized_keys $cat id_rsa.pub.slave-2 >> authorized_keys $scp authorized_keys hadoop@slave-1:~/.ssh/ $scp authorized_keys hadoop@slave-2:~/.ssh/
在每台机器上修改下面两个文件的权限:
$chmod 600 ~/.ssh/authorized_keys $chmod 700 ~/.ssh/
测试是否SSH互信配置好了:
$ssh master $ssh slave-1 $ssh slave-2
4.JDK安装
给集群每台机器安装JDK,并且设置好JAVA_HOME
$mkdir /usr/java 将jdk-7u71-linux-x64.tar上传到/usr/java目录并解压 $tar -xvf jdk-7u71-linux-x64.tar /usr/java $rm -rf jdk-7u71-linux-x64.tar $mv /usr/java/jdk1.7.0_71 /usr/java/jdk1.7
设置JAVA_HOME
$vi /etc/profile 追加下面文件 export JAVA_HOME=/usr/java/jdk1.7 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=.:$JAVA_HOME/bin:$PATH $source /etc/profile $java -version 检验JAVA环境变量是否设置好了
二.hadoop安装
1.解压并重命名hadoop
在master操作
将hadoop-2.5.2.tar 上传到/home/hadoop目录下并解压
$tar -xvf hadoop-2.5.2.tar $rm -rf hadoop-2.5.2.tar
2.创建hadoop需要的目录
创建如下文件夹(在每台节点创建)
mkdir -p /home/hadoop/tmp mkdir -p /home/hadoop/dfs/name mkdir -p /home/hadoop/dfs/data
3.配置HADOOP_HOME
$vi /etc/profile 添加 export HADOOP_HOME=/home/hadoop/hadoop-2.5.2/ export PATH=$PATH:$HADOOP_HOME/bin
4.编辑slaves
vi /home/hadoop/hadoop-2.5.2/etc/hadoop/slaves #添加如下内容 slave-1 slave-2
5.给hadoop指定JAVA_HOME
vi hadoop_env.sh 找到JAVA_HOME 设值为/usr/java/jdk1.7
6.修改hadoop配置文件
1).修改core-site.xml
$vi /home/hadoop/hadoop-2.5.2/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property> </configuration>
2).修改hdfs-site.xml
$vi /home/hadoop/hadoop-2.5.2/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
3).修改mapred-site.xml
$mv mapred-site.xml.templete mapred-site.xml
$vi /home/hadoop/hadoop-2.5.2/etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
4).修改yarn-site.xml
$vi /home/hadoop/hadoop-2.5.2/etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration>
7.拷贝软件到其他节点
scp -r /home/hadoop/hadoop-2.5.2 slave-1:/home/hadoop/ scp -r /home/hadoop/hadoop-2.5.2 slave-2:/home/hadoop/ scp -r /home/hadoop/hadoop-2.5.2 slave-3:/home/hadoop/
8.格式化hdfs文件系统
$hdfs namenode –format
9.启动hadoop
$/home/hadoop/hadoop-2.5.2/sbin/start-all.sh
10.检测master节点:有以下三个进程表示启动成功
检测master节点:有以下三个进程表示启动成功
$jps 41837 SecondaryNameNode 41979 ResourceManager 41661 NameNode
检测slave节点:有以下两个进程表示启动成功
$jps [root@master hadoop]# jps 4543 DataNode 4635 NodeManager