编译hadoop并获取native

文档说明
    该文档为完成编译后,根据印象编写,并未经过重新编译测试,大体步骤应该没错,如有错误,望指正。
    编译的最初目的,是为了获取合适的native包,并且检测手动编译的native包和自带的native包有何差别。
    编译的版本为apache-hadoop-2.7.6和hadoop-2.6.0-cdh5.5.0。

编译说明
    源码中会有一个 BUILDING.txt 文件,编译需要的依赖和命令等大部分信息都可以在里面找到。
    比如依赖:

        Requirements:
        * Unix System
        * JDK 1.7+
        * Maven 3.0 or later
        * Findbugs 1.3.9 (if running findbugs)
        * ProtocolBuffer 2.5.0
        * CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
        * Zlib devel (if compiling native code)
        * openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
        * Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
        * Internet connection for first build (to fetch all Maven and Hadoop dependencies)

编译环境
    $ uname -a    # Linux e3base01 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
    $ java -version    # java version "1.7.0_04"
    $ mvn -v    # Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-15T01:29:23+08:00)
    # findbugs未安装。
    $ protoc --version    # libprotoc 2.5.0
    $ cmake -version    # cmake version 2.8.12.2
    $ yum list installed | grep zlib-devel    # zlib-devel.x86_64       1.2.3-29.el6    @base
    $ yum list installed | grep openssl-devel    # openssl-devel.x86_64    1.0.1e-57.el6   @base
    $ yum list installed | grep fuse    # fuse.x86_64             2.8.3-4.el6     @anaconda-CentOS-201311272149.x86_64/6.5    # 系统自带,并未刻意安装。
    ## 编译目录预留推荐最少5G空间,编译完后编译目录占用空间接近4G。

安装依赖
    jdk
        下载tar包,tar开。
        配置环境变量,如~/.bash_profile中添加:
            export JAVA_HOME=/path/to/jdk
            export PATH=$JAVA_HOME/bin:$PATH
        $ source ~/.bash_profile # 立即生效。
        验证 
            $ java -version
    maven
        下载tar包,tar开。
        配置环境变量,如~/.bash_profile中添加:
            export MAVEN_HOME=/path/to/maven
            export MAVEN_OPTS="-Xmx4g -Xms4g"
            export PATH=$MAVEN_HOME/bin:$PATH
        $ source ~/.bash_profile # 立即生效。
        验证 
            $ mvn -v
    protobuff
        下载tar包,tar开。
        # ./configure
        # make
        # make install
        验证 
            $ protoc --version
    cmake,zlib-devel,openssl-devel
        $ yum install cmake zlib-devel openssl-devel
        验证 
            $ yum list installed | egrep 'cmake|zlib-devel|openssl-devel'

编译命令
    ## 编译,并生成native。编译过程如snappy和openssl包不可用,则编译直接报错。    # 该项未测试。
    $ mvn clean package -Pdist,native -DskipTests -Dtar -Drequire.snappy -Drequire.openssl
    ## 编译,并生成native,会将系统的snappy和openssl等共享库添加至native目录。注意将 snappy.lib 和 openssl.lib 指向系统正确的lib目录。
    $ mvn clean package -Pdist,native -DskipTests -Dtar -Drequire.snappy -Dsnappy.lib=/usr/lib64 -Dbundle.snappy -Drequire.openssl -Dopenssl.lib=/usr/lib64 -Dbundle.openssl

编译过程中无法下载依赖的问题
    编译过程会遇到报错,一般是显示download xxx,然后就失败,提示无法下载依赖等字样。
    因为编译要求jdk1.7,部分源不支持jdk1.7访问,网络人说是网络协议的问题,换1.8就好了。
    这里编译指定了1.7,要编译的话只能用下面方法。
        换源,比如在maven的配置文件中,增加ali源。下载依赖时,同一个包会在多个库之间轮询,轮询到ali原,就下载成功了。
        偶尔出现下载卡主的情况,这时可以ctrl c掉,重新编译。
        有时候换源也不解决问题,只能手动将对应的包放入maven的本次仓库内。
        便捷的方法,是复制编译过程中提示的依赖地址,然后进到maven对应的依赖目录下,使用wget命令将包get下来。
        或者直接从其他地方拷一份较完整的仓库过来,附件提供的仓库可能不全,但涵盖了大部分依赖包,可以节省部分时间。

编译结果
# apache-2.7.6版本编译完成。

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main ................................. SUCCESS [  1.413 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [  1.060 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [  1.873 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [  3.796 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.214 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  1.974 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  5.477 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [  8.330 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [  5.958 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  2.701 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [01:28 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [  4.182 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 10.096 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.040 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [04:20 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 38.823 s]
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [  4.453 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [  3.027 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.058 s]
[INFO] hadoop-yarn ........................................ SUCCESS [  0.036 s]
[INFO] hadoop-yarn-api .................................... SUCCESS [02:54 min]
[INFO] hadoop-yarn-common ................................. SUCCESS [ 18.703 s]
[INFO] hadoop-yarn-server ................................. SUCCESS [  0.040 s]
[INFO] hadoop-yarn-server-common .......................... SUCCESS [  8.185 s]
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 15.118 s]
[INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [  4.324 s]
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 16.786 s]
[INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 14.241 s]
[INFO] hadoop-yarn-server-tests ........................... SUCCESS [  4.823 s]
[INFO] hadoop-yarn-client ................................. SUCCESS [  5.175 s]
[INFO] hadoop-yarn-server-sharedcachemanager .............. SUCCESS [  2.580 s]
[INFO] hadoop-yarn-applications ........................... SUCCESS [  0.046 s]
[INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [  1.914 s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [  1.467 s]
[INFO] hadoop-yarn-site ................................... SUCCESS [  0.041 s]
[INFO] hadoop-yarn-registry ............................... SUCCESS [  3.651 s]
[INFO] hadoop-yarn-project ................................ SUCCESS [  3.318 s]
[INFO] hadoop-mapreduce-client ............................ SUCCESS [  0.198 s]
[INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 13.687 s]
[INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 14.004 s]
[INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [  3.039 s]
[INFO] hadoop-mapreduce-client-app ........................ SUCCESS [  6.411 s]
[INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [  5.591 s]
[INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [  4.838 s]
[INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [  2.332 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  4.209 s]
[INFO] hadoop-mapreduce ................................... SUCCESS [  2.373 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [  3.223 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  7.516 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [  1.895 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [  3.894 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [  2.840 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [  1.690 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [  1.636 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [  3.118 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [  5.782 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [  4.930 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [01:19 min]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [  3.164 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [  7.182 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  0.910 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  5.668 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [  7.558 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [  0.031 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [04:35 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19:52 min
[INFO] Finished at: 2018-12-12T16:40:32+08:00
[INFO] Final Memory: 242M/3915M
[INFO] ------------------------------------------------------------------------


# cdh-5.5.0版本编译完成。

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main ................................. SUCCESS [  1.799 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [  0.969 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [  2.744 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.317 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  1.852 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  2.922 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [  3.499 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [  3.985 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  2.539 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [01:23 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [  4.392 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 10.646 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.040 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [04:28 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [01:18 min]
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [  3.934 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [  3.894 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.061 s]
[INFO] hadoop-yarn ........................................ SUCCESS [  0.043 s]
[INFO] hadoop-yarn-api .................................... SUCCESS [02:01 min]
[INFO] hadoop-yarn-common ................................. SUCCESS [ 18.434 s]
[INFO] hadoop-yarn-server ................................. SUCCESS [  0.027 s]
[INFO] hadoop-yarn-server-common .......................... SUCCESS [  8.025 s]
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 23.947 s]
[INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [  7.871 s]
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [  6.325 s]
[INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 16.706 s]
[INFO] hadoop-yarn-server-tests ........................... SUCCESS [  0.896 s]
[INFO] hadoop-yarn-client ................................. SUCCESS [  4.011 s]
[INFO] hadoop-yarn-applications ........................... SUCCESS [  0.033 s]
[INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [  2.224 s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [  1.695 s]
[INFO] hadoop-yarn-site ................................... SUCCESS [  0.042 s]
[INFO] hadoop-yarn-registry ............................... SUCCESS [  4.132 s]
[INFO] hadoop-yarn-project ................................ SUCCESS [  4.410 s]
[INFO] hadoop-mapreduce-client ............................ SUCCESS [  0.159 s]
[INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 18.474 s]
[INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 19.756 s]
[INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [  3.631 s]
[INFO] hadoop-mapreduce-client-app ........................ SUCCESS [  7.248 s]
[INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [  4.414 s]
[INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [  4.481 s]
[INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [  1.721 s]
[INFO] hadoop-mapreduce-client-nativetask ................. SUCCESS [01:05 min]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  3.870 s]
[INFO] hadoop-mapreduce ................................... SUCCESS [  4.490 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [  3.039 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  6.679 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [  1.838 s]
[INFO] Apache Hadoop Archive Logs ......................... SUCCESS [  2.622 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 30.751 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 58.028 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [  2.962 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [  1.581 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [  2.339 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [  5.728 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [  3.728 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [  7.629 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [  3.251 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [  4.500 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  1.157 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  3.543 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [  9.379 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [  0.088 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [03:02 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19:23 min
[INFO] Finished at: 2018-12-12T17:22:39+08:00
[INFO] Final Memory: 258M/4065M
[INFO] ------------------------------------------------------------------------

获取官方自带native
    hadoop-2.7.6
        下载tar包,tar开。
        /path/to/hadoop/lib/native
    hadoop-2.6.0-cdh5.5.0
        下载rpm包  https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.5.0/RPMS/x86_64/hadoop-2.6.0+cdh5.5.0+921-1.cdh5.5.0.p0.15.el6.x86_64.rpm
        $ rpm2cpio hadoop-2.6.0+cdh5.5.0+921-1.cdh5.5.0.p0.15.el6.x86_64.rpm | cpio -div
        $ ./usr/lib/hadoop/lib/native

编译后的native对比
    hadoop-2.7.6
        在增加 -Drequire.snappy -Dsnappy.lib=/usr/lib64 -Dbundle.snappy -Drequire.openssl -Dopenssl.lib=/usr/lib64 -Dbundle.openssl 参数时,
            与不增加上述参数编译的native包对比,增加了 libcrypto.so* libk5crypto.so* libsnappy.so* 等文件。
            libk5crypto.so是一个失效的连接,需要手动找到连接对应的共享库,拷贝到native目录下。
        不增加上述参数,和自带的共享库对比,共享库文件大小会发生改变,但不会有共享库缺失和增加。
    hadoop-2.6.0-cdh5.5.0
        在增加 -Drequire.snappy -Dsnappy.lib=/usr/lib64 -Dbundle.snappy -Drequire.openssl -Dopenssl.lib=/usr/lib64 -Dbundle.openssl 参数时,
            与不增加上述参数编译的native包对比,增加了 libcrypto.so* libk5crypto.so* libsnappy.so* 等文件。
            libk5crypto.so是一个失效的连接,需要手动找到连接对应的共享库,拷贝到native目录下。
        不增加上述参数,和自带的共享库对比,增加了 libhdfs.so libhdfs.so.0.0.0 文件,缺少了 libsnappy.so libsnappy.so.1 libsnappy.so.1.1.3 文件。

部署集群后native的检查
    $ hadoop checknative    ## 均显示true,则说明native正常。

18/12/12 19:02:31 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
18/12/12 19:02:31 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /src/hadoop-2.6.0-cdh5.5.0/hadoop-dist/target/hadoop-2.6.0-cdh5.5.0/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  true /src/hadoop-2.6.0-cdh5.5.0/hadoop-dist/target/hadoop-2.6.0-cdh5.5.0/lib/native/libsnappy.so.1
lz4:     true revision:99
bzip2:   true /lib64/libbz2.so.1
openssl: true /src/hadoop-2.6.0-cdh5.5.0/hadoop-dist/target/hadoop-2.6.0-cdh5.5.0/lib/native/libcrypto.so


    一般来讲,只要hadoop本身的共享库,即自带共享库可用,其他库如 zlib snappy lz4 bzip2 openssl 等都可以加载系统自带的共享库(默认安装位置)。
    有强迫症的话,可以将系统自带的共享库文件,拷贝一份放到 native 目录下,执行 checknative 所使用的库都会使用 native 下的共享库。
    手动拷贝共享库到native,和编译时指定参数,效果未经测试。

附件,编译所需的大部分依赖库。

https://download.csdn.net/download/anyuzun/10846380

猜你喜欢

转载自blog.csdn.net/anyuzun/article/details/84977893