注:本文中缺少的图可以下载附件DOC
Hadoop2.7.1集群搭建
1.系统配置
电脑1(Lenovo),win7 64位系统,8G内存,此电脑虚拟机上运行name系统。
电脑1(Lenovo ),win7 64位系统,8G内存,此电脑虚拟机上运行standyname系统
电脑1(Lenovo),win7 64位系统,8G内存,此电脑虚拟机上运行amrm系统
虚拟机:Vmware12.0
Hadoop2.7.1
Zookeeper3.4.6
2.集群规划
其具体规划如下:
JournalServer 应该单纯一台,slaves文件中为JournalNode(存储name的元数据)
journalServer and journalNode 中配置zookeeper,name和standy name
主机名 IP 安装软件 运行的进程
name 192.168.32.137 Jdk,hadoop zookeeper namenode、DFSZKFailoverController、datanode、jobhistorysever、NodeManager、JournalNode、QuorumPeerMain sname 192.168.32.135 Jdk,hadoop zookeeper Namenode、DFSZKFailoverController,datanode、NodeManager、JournalNode、QuorumPeerMain amrm 192.168.32.136 Jdk,hadoop zookeeper datanode、NodeManager、JournalNode、QuorumPeerMain,ResourceManager
说明:
在hadoop2.0中通常由两个name组成,一个处于active状态,另一个处于standby状态。Active name对外提供服务,而Standby name则不对外提供服务,仅同步activename的状态,以便能够在它失败时快速进行切换。hadoop2.0官方提供了两种HDFS HA的解决方案,一种是NFS,另一种是QJM。这里我们使用简单的QJM。在该方案中,主备name之间通过一组JournalNode同步元数据信息,一条数据只要成功写入多数JournalNode即认为写入成功。通常配置奇数个JournalNode
这里还配置了一个zookeeper集群,用于ZKFC(DFSZKFailoverController)故障转移,当
Active name挂掉了,会自动切换Standby name为standby状态。
1)在name,sname,amrm命令行vim /etc/hostname中分别设置name,sname,amrm的主机名,如下图所示:
2)在name,sname,amrm命令行vim /etc/hosts 中设置name,sname,amrm主机名和ip地址的对应关系,如下图所示:
3)验证各系统之间是否能够ping通。
4)安装SSH 并产生公私钥在name上:(可以copy ~/.ssh 到 sname和amrm,统一公私钥)
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
拷贝公钥到sname,amrm做同样的动作(最好统一公私钥)
scp -r /root/.ssh root@sname:/root/ scp -r /root/.ssh/id_dsa.pub root@sname:/root/.ssh/id_dsa.pub scp -r /root/.ssh/id_dsa.pub root@amrm:/root/.ssh/id_dsa.pub
检查 ssh sname amrm 保证互相访问不需要密码 ,如果slaves文件中包括自己那么还要执行
ssh name
---------------------------------------------------------------------------------------------------------------------------------------------
Scp 命令:
// scp from source to destination(local) scp root@data:/root/.ssh/id_dsa.pub ~/.ssh/data_dsa.pub // scp from source(local) to destination scp -r /root/.ssh/id_dsa.pub root@amrm:/root/.ssh/id_dsa.pub
---------------------------------------------------------------------------------------------------------------------------------------------
注:scp 在ssh通的情况下用
错误:
-1. Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /root/.ssh/known_hosts:2
remove with:
ssh-keygen -f "/root/.ssh/known_hosts" -R sname
执行:
ssh-keygen -f "/root/.ssh/known_hosts" -R sname
或删除
/root/.ssh/known_hosts的第2行。
-2. Warning: the ECDSA host key for 'sname' differs from the key for the IP address '192.168.32.138'
Offending key for IP in /root/.ssh/known_hosts:2
Matching host key in /root/.ssh/known_hosts:5
Are you sure you want to continue connecting (yes/no)? yes
Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)
* Documentation: https://help.ubuntu.com/
82 packages can be updated.
42 updates are security updates.
Last login: Fri Dec 11 22:30:00 2015 from 192.168.32.138
解决:删除/root/.ssh/known_hosts的第2行。
-3. Your id_dsa is 755 cann’t used
chmod 700 ~/.ssh/id_dsa(私钥文件权限)
5)关ip6
-1.
cat /proc/sys/net/ipv6/conf/all/disable_ipv6
显示0说明ipv6开启,1说明关闭
-2在 /etc/sysctl.conf 增加下面几行,并重启。
#disable IPv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
-3. sudo vim /etc/default/grub
-4. 将文件中的
GRUB_CMDLINE_LINUX_DEFAULT="quiet spalsh"
修改为
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 quiet splash"
-5. wq保存后,运行sudo update-grub更新
-6. 重启网络服务,禁用ipv6成功
可以使用
ip a | grep inet6
查看关闭情况,若没有结果则说明禁用IPv6成功
3.安装配置zookeeper集群
1)解压zookeeper压缩包到/hadoop
tar –zxvf zookeeper-3.4.6.tar.gz /hadoop mv /hadoop/zookeeper-3.4.6 /hadoop/zookeeper-3.4.6
2)在/hadoop/zookeeper-3.4.6/conf修改zookeeper配置zoo.cfg,具体配置如下图所示:
3)在/hadoop/zookeeper-3.4.6中设置创建tmp目录
Mkdir /hadoop/zookeeper-3.4.6/tmp
4)在/hadoop/zookeeper-3.4.6/tmp目录中创建空文件myid,并写入4
vim /hadoop/zookeeper-3.4.6/tmp/myid。
5)将配置好的zookeeper拷贝到sname和amrm
scp -r /hadoop/zookeeper-3.4.6 root@sname:/hadoop/zookeeper-3.4.6 scp -r /hadoop/zookeeper-3.4.6 root@amrm:/hadoop/zookeeper-3.4.6
6)在sname和amrm中分别修改myid为2和3。
4.安装配置hadoop集群
1)解压hadoop压缩包到/hadoop
tar -zxvf hadoop-2.7.1.tar.gz /hadoop
2)安装hadoop在~/.bashrc中配置hadoop的环境变量信息,如下图所示:
# the variable for hadoop export JAVA_HOME=/usr/lib/java/jdk1.7.0_79 export JRE_HOME=${JAVA_HOME}/jre export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export HADOOP_HOME=/hadoop/hadoop-2.7.1 export ZOOKEEPER_HOME=/hadoop/zookeeper-3.4.6 export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZOOKEEPER_HOME}/bin:${PATH} export HADOOP_MAPRED_HOME=${HADOOP_HOME} export HADOOP_COMMON_HOME=${HADOOP_HOME} export HADOOP_HDFS_HOME=${HADOOP_HOME} export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP}/lib/native export YARN_HOME=${HADOOP_HOME} export HADOOP_OPT="-Djava.library.path=${HADOOP_HOME}/lib/native"
5.配置hadoop
hadoop2.7.1的所有配置文件从存在/hadoop/hadoop-2.7.1/etc/hadoop之中。
cd /hadoop/hadoop-2.7.1/etc/hadoop
1)修改hadoop-env.sh 加入jdk家目录
export JAVA_HOME=/usr/lib/java/jdk1.7.0_79
2)修改core-site.xml
<configuration> <!-- 指定hdfs的nameservice为ns --> <property> <name>fs.defaultFS</name> <value>hdfs://ns</value> </property> <!-- 指定hadoop临时目录 --> <property> <name>hadoop.tmp.dir</name> <value>/hadoop/tmp </value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>name:2181,sname:2181,amrm:2181</value> </property> </configuration>
3)修改hdfs-site.xml //
<configuration> <!--指定hdfs的nameservice为ns,需要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>ns</value> </property> <!-- ns下面有两个name,分别是nm,snm --> <property> <name>dfs.ha.names.ns</name> <value>nm,snm</value> </property> <!-- nm的RPC通信地址 --> <property> <name>dfs.name.rpc-address.ns.nm</name> <value>name:9000</value> </property> <!-- nm的http通信地址 --> <property> <name>dfs.name.http-address.ns.nm</name> <value>name:50070</value> </property> <!-- snm的RPC通信地址 --> <property> <name>dfs.name.rpc-address.ns.snm</name> <value>sname:9000</value> </property> <!-- snm的http通信地址 --> <property> <name>dfs.name.http-address.ns.snm</name> <value>sname:50070</value> </property> <!-- hadoop.tmp.dir 在core-site.xml中设置这里不用设,否者则添加如下两个属性 --> <property> <name>dfs.name.name.dir</name> <value>/hadoop/dfs/name</value> </property> <property> <name>dfs.name.data.dir</name> <value>/hadoop/dfs/data</value> </property> <!-- 指定name的元数据在JournalNode上的存放位置 加入amrm集群更健壮--> <property> <name>dfs.name.shared.edits.dir</name> <value>qjournal://name:8485;sname:8485;amrm:8485/ns</value> </property> <!-- 指定JournalNode在本地磁盘存放数据的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/hadoop/journal</value> </property> <!-- 开启name失败自动切换 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失败自动切换实现方式 --> <property> <name>dfs.client.failover.proxy.provider.ns</name> <value>org.apache.hadoop.hdfs.server.name.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行--> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <!-- 使用sshfence隔离机制时需要ssh免登陆 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_dsa</value> </property> <!-- 配置sshfence隔离机制超时时间 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> </configuration>
4)修改mapred-site.xml
<configuration> <!-- 指定mr框架为yarn方式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 启动historyserver --> <property> <name>mapreduce.jobhistory.address</name> <value>name:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>name:19888</value> </property> <!--dir为分布式文件系统中的文件目录,启动时先启动dfs,在启动historyserver --> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/history/indone</value> </property> <!--dir为分布式文件系统中的文件目录,启动时先启动dfs,在启动historyserver --> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/history/done</value> </property> </configuration>
5)修改yarn-site.xml
<configuration> <!-- 指定resourcemanager地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>amrm</value> </property> <!--ResourceManager 对客户端暴露的地址。 客户端通过该地址向RM提交应用程序,杀死应用程序等--> <property> <name>yarn.resourcemanager.address</name> <value>${yarn.resourcemanager.hostname}:8032</value> </property> <!--ResourceManager 对ApplicationMaster暴露的访问地址。 ApplicationMaster通过该地址向RM申请资源、释放资源等。--> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>${yarn.resourcemanager.hostname}:8030</value> </property> <!-- ResourceManager 对NodeManager暴露的地址。 NodeManager通过该地址向RM汇报心跳,领取任务等。--> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>${yarn.resourcemanager.hostname}:8031</value> </property> <!--ResourceManager 对管理员暴露的访问地址。 管理员通过该地址向RM发送管理命令等。默认值:${yarn.resourcemanager.hostname}:8033--> <property> <name>yarn.resourcemanager.admin.address</name> <value>${yarn.resourcemanager.hostname}:8033</value> </property> <!--ResourceManager对外web ui地址--> <property> <name>yarn.resourcemanager.webapp.address</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <!-- 指定nodemanager启动时加载server的方式为shuffle server --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
6)修改slaves
slaves是指定子节点的位置,因为要在name上启动HDFS、在amrm启动yarn,所以name上的slaves文件指定的是datanode的位置,amrm上的slaves文件指定的是nodemanager的位置
cd /hadoop/hadoop-2.7.1/tmp/hadoop/etc/hadoop/ vim slaves name sname amrm
注:name中slaves为amrm和journalnode的地址,amrm中slaves为nodeamananger的地址。
6.将配置好的hadoop拷贝到sname和amrm
scp -r /hadoop/hadoop-2.7.1/tmp root@amrm:/hadoop/hadoop-2.7.1/tmp scp -r /hadoop/hadoop-2.7.1/tmp root@sname:/hadoop/hadoop-2.7.1/tmp/ scp -r /hadoop/hadoop-2.7.1/tmp root@amrm:/hadoop/hadoop-2.7.1/tmp/
*********************注意:以下操作必须严格按照顺序*****************************
7.启动zookeeper集群,(在name,sname,amrm的/hadoop/hadoop-2.7.1/tmp/zk/bin/里开启)
cd /hadoop/hadoop-2.7.1/tmp/zk/bin // 按顺序启动name,sname,amrm ./zkServer.sh start(启动zookeeper节点) ./zkServer.sh status(查看zookeeper状态)
8.启动journalnode,(在name,sname,amrm的/hadoop/hadoop-2.7.1/tmp/hadoop/sbin里启动)//在name中启动即可 非hadoop-daemon.sh
hadoop-daemons.sh start journalnode
jps(依次在每个节点中查看各节点是否多了Journalnode进程)
9.格式化HDFS,在name上执行格式化命令
hdfs namenode -format ns
格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/hadoop/hadoop-2.7.1/tmp,然后将/hadoop/hadoop-2.7.1/tmp拷贝到sname和amrm的/hadoop/hadoop-2.7.1/tmp下。
scp -r /hadoop/hadoop-2.7.1/dfs root@sname:/hadoop/hadoop-2.7.1 scp -r /hadoop/hadoop-2.7.1/dfs root@amrm:/hadoop/hadoop-2.7.1
注:格式化生成的目录不要轻易删除,否者启动回报不一致异常
10.格式化ZK,在name上执行格式化命令
hdfs zkfc -formatZK
11.启动HDFS,在name的/hadoop/hadoop-2.7.1/tmp/hadoop/sbin中执行start-dfs.sh命令
cd /hadoop/hadoop-2.7.1/sbin/ start-dfs.sh
启动之后,分别进入name,sname,amrm中jps,查看是否多了name 和 DFSZKFailoverController两个进程(name,sname)
12.启动 historyserver 在name中的/hadoop/hadoop-2.7.1/tmp/hadoop/sbin中执行,
hdfs dfs -mkdir /history hdfs dfs -mkdir /history/indone hdfs dfs -mkdir /history/done mr-jobhistory-daemon.sh start historyserver
13.启动YARN
在 amrm 中的/hadoop/hadoop-2.7.1/tmp/hadoop/sbin中执行start-yarn.sh命令
cd /hadoop/hadoop-2.7.1/sbin/ start-yarn.sh
是在amrm上执行start-yarn.sh,把name和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动
14.到此,hadoop2.7.0的配置完毕,可以通过浏览器访问来查看部署是否成功
1) http://192.168.32.137:50070 namenode
2) http://192.168.32.136:8088 resourcemanager
3) http://192.168.32.137:19888 jobhistroysever
15.执行job
1)hdfs dfs -mkdir /test 2)hdfs dfs -mkdir /test/input 3)hdfs dfs -put etc/hadoop/*.xml /test/input 4)hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep /test/input /test/output 'dfs[a-z.]+' [img]http://dl2.iteye.com/upload/attachment/0122/5366/817ca7cf-a5b0-307c-bdd8-40ede16677f4.png[/img] 5) hdfs dfs -get /test/output output //当前目录 6) cat output/* 查看结果
备注:另外一种查看结果的方式
hdfs dfs -cat /test/output/*
查看job状态:
Jobhistorysever:
16.关闭hadoop
在amrm中
stop-yarn.sh
在name中
mr-jobhistory-daemon.sh stop historyserver stop-dfs.sh
17.
hadoop dfsadmin -safemode leave
注:以上过程有什么问题,可以查看相关日志文件
相关异常
1. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
因为掉电,导致hadoop 的HA 出现 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby 此问题,原来从web 页面查看,是两个节点都变成了standy,所以要切换
hdfs haadmin -transitionToActive --forcemanual nm
2. org.apache.hadoop.ipc.Client: Retrying connect to server: amrm/192.168.32.136:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
telnet: Unable to connect to remote host: Connection refused Ubuntu 15.10
查看能否ping通,查看端口是否开放,如果能ping通,同时端口开放,用如下命令查看系统端口监听
netstat -ntulp
确保local Address的地址为0.0.0.0 或192.168.32.137。
解决办法 修改/etc/hosts 地址映射
3. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
name 处于standby状态
4. org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://sname:9000/user/root/grep-temp-1382738569
这个是由于map的产生的文件放在分布式文件系统/user/${username}中新建
hdfs dfs -mkdir /user hdfs dfs -mkdir /user/${username}
5. java.io.IOException: Unknown Job job_1450012188054_0001 at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:218)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getCounters(HistoryClientService.java:232) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:159) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:281)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
解决办法
:hdfs dfs -chmod -R 777 /history
6
解决方式:在/etc/hosts中,添加jamel地址映射。
注:
1.Job 成功的显示输出结果
15/12/13 22:17:44 INFO mapreduce.Job: Job job_1450012188054_0002 completed successfully 15/12/13 22:17:45 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=493 FILE: Number of bytes written=1176179 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=33949 HDFS: Number of bytes written=663 HDFS: Number of read operations=30 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Killed map tasks=3 Launched map tasks=12 Launched reduce tasks=1 Data-local map tasks=12 Total time spent by all maps in occupied slots (ms)=1450715 Total time spent by all reduces in occupied slots (ms)=112387 Total time spent by all map tasks (ms)=1450715 Total time spent by all reduce tasks (ms)=112387 Total vcore-seconds taken by all map tasks=1450715 Total vcore-seconds taken by all reduce tasks=112387 Total megabyte-seconds taken by all map tasks=1485532160 Total megabyte-seconds taken by all reduce tasks=115084288 Map-Reduce Framework Map input records=926 Map output records=17 Map output bytes=508 Map output materialized bytes=541 Input split bytes=969 Combine input records=17 Combine output records=15 Reduce input groups=15 Reduce shuffle bytes=541 Reduce input records=15 Reduce output records=15 Spilled Records=30 Shuffled Maps =9 Failed Shuffles=0 Merged Map outputs=9 GC time elapsed (ms)=67395 CPU time spent (ms)=15090 Physical memory (bytes) snapshot=1492398080 Virtual memory (bytes) snapshot=6682742784 Total committed heap usage (bytes)=1178963968 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=32980 File Output Format Counters Bytes Written=663
2.本文所搭建的是高可用对于namenode而言,而RM HA可以访问如下地址:
ResourceMananger HA 访问-
http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html、