[大数据-hadoop]hadoop2.7.4在Ubuntu12.0.4下的编译

hadoop2.7.4在Ubuntu12.0.4下的编译

[大数据-hadoop]

编译步骤:
一、下载编译所需要的包(下列是我编译使用的软件版本)

1.hadoop-2.7.4-src.tar.gz(最新的稳定版本是2.7.4)
2.apache-ant-1.9.4-bin.tar.gz
3.protobuf-2.5.0.tar.gz
4.findbugs-3.0.1.tar.gz
5.apache-maven-3.2.5-bin.tar.gz
6.jdk-7u80-linux-x64.tar.gz

二、解压并配置软件和环境变量

# vim /etc/profile
export JAVA_HOME=/cloud/jdk1.7.0_80
export JRE_HOME=/cloud/jdk1.7.0_80/jre
export MAVEN_HOME=/cloud/apache-maven-3.2.5
export MAVEN_OPTS="-Xms256m -Xmx512m"
export ANT_HOME=/cloud/apache-ant-1.9.4
export FINDBUGS_HOME=/cloud/findbugs-3.0.1
export PATH=$PATH:$JAVA_HOME/bin:$MAVEN_HOME/bin:$ANT_HOME/bin:$FINDBUGS_HOME/bin:$JRE_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
# source /etc/profile

三、编译过程

step1:
# cd /cloud/src/hadoop2.7/hadoop-2.7.4-src
# cd hadoop-maven-plugins/
# mvn install

step2:
# cd /cloud/src/hadoop2.7/hadoop-2.7.4-src
# mvn package -Pdist -DskipTests -Dtar
注意:这步的编译时间很长,如果中间出现长时间没响应,请ctrl+c中断命令,重新执行,直到 BUILD SUCCESS。
仅仅是为了编译源码,而不是要调试源码。到这步就够了,生成的文件路径在:
/cloud/src/hadoop2.7/hadoop-2.7.4-src/hadoop-dist/target/hadoop-2.7.4.tar.gz

step3:
# cd /cloud/src/hadoop2.7/hadoop-2.7.4-src
# mvn eclipse:eclipse -DskipTests
该步骤方便看看原理,调试源代码,编译完成后,导入maven文件即可(本人尚未试过)。

四、hadoop2.7.4安装
1.准备软件

jdk1.7.0_80
hadoop-2.7.4.tar.gz
zookeeper-3.3.6.tar.gz

2.解压并设置环境变量(过程不再简述,直接看配置文件)

# vim /etc/profile
export JAVA_HOME=/cloud/jdk1.7.0_80
export ZOOKEEPER_HOME=/cloud/zookeeper-3.3.6
export HADOOP_HOME=/cloud/hadoop-2.7.4
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export CALSSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

3.免密ssh登录

  $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys

4.配置hadoop2.7.4的配置文件
hadoop-env.sh
hdfs-site.xml
core-site.xml
yarn-site.xml
slaves
如下是本人的配置文件
4.1 设置hadoop-env.sh中的JAVA_HOME目录

# cd /cloud/hadoop-2.7.4/etc/hadoop
# vim hadoop-env.sh
export JAVA_HOME=/cloud/jdk1.7.0_80

4.2 hdfs-site.xml配置

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!--dfs.nameservice configure mycluster -->
    <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
    </property>
    <!-- dfs.ha.namenodes.[namservice ID] configure unique identifies for each NameNode -->
    <property>
    <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
    </property>
    <!-- dfs.namenode.rpc-address.[nameservice ID].[name node ID]  -->
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn1</name>
      <value>202.96.64.8:8020</value>
    </property>
    <property>
       <name>dfs.namenode.rpc-address.mycluster.nn2</name>
       <value>202.96.64.10:8020</value>
    </property>
    <!-- dfs.namenode.http-address.[nameservice ID].[name node ID] the fully-qualified HTTP address for each NameNode to listen on  Similarly to rpc-address above -->
    <property>
      <name>dfs.namenode.http-address.mycluster.nn1</name>
      <value>202.96.64.8:50070</value>
    </property>
    <property>
       <name>dfs.namenode.http-address.mycluster.nn2</name>
       <value>202.96.64.10:50070</value>
    </property>
    <!-- the URI which identifies the group of JNs where the NameNodes will write/read edits -->
    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://202.96.64.10:8485;202.96.64.12:8485;202.96.64.14:8485/mycluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
    </property>

    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>

    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/opt/jn/data</value>
    </property>
    <!-- configure automatic failover -->
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>ha.zookeeper.quorum</name>
        <value>202.96.64.8:2181,202.96.64.10:2181,202.96.64.12:2181</value>
    </property>





</configuration>

4.3core-site.xml配置

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://mycluster</value>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>202.96.64.8:2181,202.96.64.10:2181,202.96.64.12:2181</value>
    </property>

    <property>
      <name>hadoop.tmp.dir</name>
      <value>/opt/hadoop2</value>
    </property>

    <property>
      <name>hadoop.proxyuser.hadoop.hosts</name>
      <value>*</value>
    </property>

    <property>
      <name>hadoop.proxyuser.hadoop.groups</name>
      <value>*</value>
    </property>
</configuration>

4.4 yarn-site.xml配置

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->

<configuration>
    <property>
            <name>yarn.resourcemanager.hostname</name>
             <value>202.96.64.8</value>
    </property>

    <property>
            <name>yarn.nodemanager.aux-services</name>
             <value>mapreduce_shuffle</value>
    </property>

    <property>
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

</configuration>








</configuration>

4.5 slaves配置

202.96.64.10
202.96.64.12
202.96.64.14

4.6 ZooKeeper配置


# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/opt/zookeeper
# the port at which the clients will connect
clientPort=2181

server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888

注意:请自己新建/opt/zookeeper目录,并且新建文件/opt/zookeeper/myid,文件写入zk的数字

5.初始化集群

1.启动三个JournalNode:
node1# hadoop-daemon.sh start journalnode
node2# hadoop-daemon.sh start journalnode
node3# hadoop-daemon.sh start journalnode
出现JournalNode表示启动成功!

2.在其中一个namenode(例如node1)上格式化:
node1# hdfs namenode -format
node1# hadoop-daemon.sh start namenode
(cd ../logs    tail -n50 hadoop-root-namenode-node1.log  查看详细日志是否启动成功,有无错误。)
启动namenode2(切换到node2主机,注意:必须有一台namenode主机先启动):
node2# hdfs namenode -bootstrapStandby
验证:切换到上面设置的系统目录那/opt/hadoop2看是否有文件存在。

3.在其中一个namenode上格式化zkfc:/root/hadoop-2.7.4/bin
node1# hdfs zkfc -formatZK

4.停止上面节点:/root/hadoop-2.7.4/sbin
node1# stop-dfs.sh

6.再次启动集群

node1# zkServer.sh start
node2# zkServer.sh start
node3# zkServer.sh start
node1# start-all.sh

7.访问

http://202.96.64.8:50070/

8.配置中额外需要注意的问题

8.1 请配置/etc/hosts将本机回路127.0.0.1注释

node1# vim /etc/hosts
# 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# 127.0.0.1 node1

202.96.64.8 node1
202.96.64.10 node2
202.96.64.12 node3
202.96.64.14 node4
202.96.64.16 node5

8.2请将防火墙及selinux.

猜你喜欢

转载自blog.csdn.net/u012425536/article/details/78769904