Windows系统中HADOOP环境搭——从头开始

第一步:在windows系统中安装linux虚拟机

1、  首先安装VMwareWorkstation,我在本地使用12.5.6版本;

2、  下载centos6.3镜像,CentOS-6.3-x86_64-bin-DVD1.iso;

3、  创建好虚拟机

第二步:配置网络

1、设置本地网卡的IP等信息

2、设置虚拟机网络

(1)修改ifcfg-eth0文件信息

命令如下:

修改信息如下:

修改完成后重启网络:service network restart

关闭防火墙:service iptables stop(临时)

                     chkconfig iptables off(永久)

3、       修改hosts文件

添加meritdata主机所对应的ip地址

第三步:安装jdk(root用户)

1、使用ftp将提前装备好的文件上传至/bigdata文件夹下

2、解压jdk:tar -vxf jdk-7u79-linux-x64.tar.gz

3、建立软连接:ln -s /bigdata/jdk1.7.0_79 java

创建好后查看如下:

4、配置jdk的环境变量

vi /etc/profile

增加如下代码:

        exportJAVA_HOME=/bigdata/java

        export CLASSPATH=/bigdata/java/lib

        export PATH=$JAVA_HOME/bin:$PATH

source/etc/profile

5、查看jdk是否安装成功

第四步:新建用户

Su -root

adduser bigdata

passwd bigdata

cd /

chown -R bigdata:bigdata /bigdata/

第五步:设置SSH免密登录(bigdata用户)

进入到港创建的用户下:

        su – bigdata

生成公钥和私钥:

        ssh-keygen-t rsa    

将公钥导入本机:                                                       

        cd~/.ssh

        catid_rsa.pub  > authorized_keys

更改权限:

        chmod600 authorized_keys

测试:

        ssh host,第一次登录可能需要yes确认,之后就可以直接登录了。

第六步:安装hadoop(bigdata账户下)

1、解压hadoop

    登录bigdata用户:su -bigdata

    进入/bigdata目录下:cd /bigdata

    解压hadoop: tar -vxf hadoop-2.6.0-cdh5.8.2.tar.gz

    解压hive: tar -vxf hive-1.1.0-cdh5.8.2.tar.gz

    创建hadoop软链接: ln -s hadoop-2.6.0-cdh5.8.2/ hadoop

    创建hive软链接: ln -s hive-1.1.0-cdh5.8.2/ hive

2、配置环境变量

    编辑profile文件:

        vi ~/.bash_profile

    增加java及hadoop配置信息:

        export JAVA_HOME=/bigdata/java

        export PATH=$JAVA_HOME/bin:$PATH

        export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

        export HADOOP_HOME=/bigdata/hadoop

        export PATH=$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH

    将hadoop配置在/etc/profile文件中:

         切换到root用户下

         vi /etc/profile

              增加:export$HADOOP_HOME=/bigdata/hadoop

                        export PATH=$HADOOP_HOME/bin:$PATH

3、修改hadoop相关xml文件

进入cd/bigdata/hadoop/etc/Hadoop目录下

(1)      vi core-site.xml

<configuration>

  <property>

   <name>fs.defaultFS</name>

    <value>hdfs://meritdata(主机名):8020</value>

  </property>

  <property>

   <name>hadoop.tmp.dir</name>

   <value>/bigdata/hadoop/tmp</value>

  </property>

  <property>

    <name>hadoop.proxyuser.$SERVER_USER.hosts</name>

    <value>*</value>

  </property>

  <property>

   <name>hadoop.proxyuser.$SERVER_USER.groups</name>

    <value>*</value>

  </property>

</configuration>

(2)      vi hdfs-site.xml

<configuration>

<property>

    <name>dfs.replication</name>

    <value>1</value>

</property>

</configuration>

(3)      vi mapred-site.xml

文件夹下没有mapred-site.xml文件,故使用如下命令:

cp mapred-site.xml.template mapred-site.xml 创建mapred-site.xml文件

 <configuration>

 <property>

   <name>mapreduce.framework.name</name>

     <value>yarn</value>

 </property>

</configuration>

(4)      vi yarn-site.xml

<property>

               <name>yarn.nodemanager.aux-services</name>

               <value>mapreduce_shuffle</value>

        </property>

        <property>

                <name>yarn.scheduler.minimum-allocation-mb</name>

                <value>1024</value>

        </property>

        <property>

               <name>yarn.scheduler.maximum-allocation-mb</name>

               <value>10240</value>

        </property>

        <property>

               <name>yarn.nodemanager.resource.memory-mb</name>

               <value>92160</value>

       </property>

(5)      vi hadoop-env.sh

#Licensed to the Apache Software Foundation (ASF) under one

# or morecontributor license agreements.  See the NOTICEfile

#distributed with this work for additional information

#regarding copyright ownership.  The ASFlicenses this file

# to youunder the Apache License, Version 2.0 (the

#"License"); you may not use this file except in compliance

# withthe License.  You may obtain a copy ofthe License at

#

#     http://www.apache.org/licenses/LICENSE-2.0

#

# Unlessrequired by applicable law or agreed to in writing, software

#distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUTWARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See theLicense for the specific language governing permissions and

#limitations under the License.

# SetHadoop-specific environment variables here.

# Theonly required environment variable is JAVA_HOME.  All others are

#optional.  When running a distributedconfiguration it is best to

# setJAVA_HOME in this file, so that it is correctly defined on

# remotenodes.

# Thejava implementation to use.

exportJAVA_HOME=/bigdata/jdk1.7.0_79

# Thejsvc implementation to use. Jsvc is required to run secure datanodes

# thatbind to privileged ports to provide authentication of data transfer

#protocol.  Jsvc is not required if SASLis configured for authentication of

# datatransfer protocol using non-privileged ports.

#exportJSVC_HOME=${JSVC_HOME}

exportHADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

# ExtraJava CLASSPATH elements.  Automaticallyinsert capacity-scheduler.

for f in$HADOOP_HOME/contrib/capacity-scheduler/*.jar; do

  if [ "$HADOOP_CLASSPATH" ]; then

    exportHADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f

  else

    export HADOOP_CLASSPATH=$f

  fi

done

# Themaximum amount of heap to use, in MB. Default is 1000.

#exportHADOOP_HEAPSIZE=

#exportHADOOP_NAMENODE_INIT_HEAPSIZE=""

# ExtraJava runtime options.  Empty by default.

exportHADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

# Commandspecific options appended to HADOOP_OPTS when specified

exportHADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS}-Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender}$HADOOP_NAMENODE_OPTS"

exportHADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS$HADOOP_DATANODE_OPTS"

exportHADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS}-Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender}$HADOOP_SECONDARYNAMENODE_OPTS"

exportHADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"

exportHADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"

# Thefollowing applies to multiple commands (fs, dfs, fsck, distcp etc)

exportHADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"

#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData$HADOOP_JAVA_PLATFORM_OPTS"

# Onsecure datanodes, user to run the datanode as after dropping privileges.

# This**MUST** be uncommented to enable secure HDFS if using privileged ports

# toprovide authentication of data transfer protocol.  This **MUST NOT** be

# definedif SASL is configured for authentication of data transfer protocol

# usingnon-privileged ports.

exportHADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Wherelog files are stored.  $HADOOP_HOME/logsby default.

#exportHADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER

# Wherelog files are stored in the secure data environment.

exportHADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

###

# HDFSMover specific parameters

###

# Specifythe JVM options to be used when starting the HDFS Mover.

# Theseoptions will be appended to the options specified as HADOOP_OPTS

# andtherefore may override any similar flags set in HADOOP_OPTS

#

# exportHADOOP_MOVER_OPTS=""

###

#Advanced Users Only!

###

# Thedirectory where pid files are stored. /tmp by default.

# NOTE:this should be set to a directory that can only be written to by

#       the user that will run the hadoopdaemons.  Otherwise there is the

#       potential for a symlink attack.

exportHADOOP_PID_DIR=${HADOOP_PID_DIR}

exportHADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# Astring representing this instance of hadoop. $USER by default.

export HADOOP_IDENT_STRING=$USER

4、文件格式化

/bigdata/hadoop/bin/hdfs namenode –format

5、启动hadoop

sbin/start-all.sh

6、可以使用hdfs创建文件夹

创建文件夹:hadoop fs –mkdir /user

查看文件夹是否创建成功:hadoop fs –ls /

         监控hadoop:

                   http://192.168.0.105:50070

                   http://192.168.0.105:8088/cluster

第七步:安装mysql(root用户下)

1、查看系统中是否有默认安装的mysql

不区分大小写查看:rpm -qa |grep -i mysql

删除已有的:rpm -eMySQL-devel-5.5.20-1.el6.x86_64 –nodeps

2、解压mysql

tar -vxf MySQL-5.5.20-1.el6.x86_64.tar

3、rpm方式安装mysql

rpm -ivhMySQL-client-5.5.20-1.el6.x86_64.rpm

rpm -ivhMySQL-devel-5.5.20-1.el6.x86_64.rpm

rpm -ivhMySQL-embedded-5.5.20-1.el6.x86_64.rpm

rpm -ivh MySQL-server-5.5.20-1.el6.x86_64.rpm

rpm -ivh MySQL-shared-5.5.20-1.el6.x86_64.rpm

rpm -ivh MySQL-test-5.5.20-1.el6.x86_64.rpm

4、cp /usr/share/mysql/my-medium.cnf /etc/my.cnf

5、vi /etc/my.cnf

[client]

default-character-set=utf8

[mysqld]

character-set-server=utf8

lower_case_table_names=1

max_connections=1000

max_allowed_packet = 100M

[mysql]

default-character-set=utf8

注:黄颜色部分为增加信息

6、将mysql-connector-java-5.1.41-bin.jar驱动文件放在hive的lib文件夹下

7、启动mysql服务

         servicemysql start

8、为mysql的root用户创建密码

创建密码:mysqladmin -uroot password 'root123'

    登录mysql:mysql -uroot -p

         查看编码:showvariables like 'characte%';

9、创建hive数据库及用户

创建hive数据库:createdatabase hive default character set latin1;

创建hive用户:createuser hive identified by 'hive';

给hive用户授权:grantall privileges on hive.* to 'hive'@'meritdata' identified by 'hive';

grant all privileges on hive.* to 'hive'@'%' identified by 'hive';

刷新:flush privileges;

第八步:配置hive

1、配置hive-site.xml文件

cd /bigdata/hive/conf

vi hive-site.xml

代码如下:

<configuration>

<!--metastore--> 

  <property>

   <name>datanucleus.autoCreateTables</name>

    <value>true</value>

  </property>

    <property>

   <name>javax.jdo.option.ConnectionURL</name>

   <value>jdbc:mysql://meritdata:3306/hive?createDatabaseIfNotExist=true</value>

  </property>

  <property>

   <name>javax.jdo.option.ConnectionUserName</name>

    <value>hive</value>

  </property>

  <property>

   <name>javax.jdo.option.ConnectionPassword</name>

    <value>hive</value>

  </property>

  <property>

   <name>javax.jdo.option.ConnectionDriverName</name>

   <value>com.mysql.jdbc.Driver</value>

  </property>

  <property>

   <name>hive.metastore.uris</name>

   <value>thrift://meritdata:9083</value>

  </property>

  <property>

   <name>hive.metastore.warehouse.dir</name>

    <value>/hive</value>

  </property>

<!--hiveserver2--> 

  <property>

    <name>hive.server2.thrift.bind.host</name>

    <value>meritdata</value>

  </property>

  <property>

   <name>hive.server2.thrift.port</name>

    <value>10000</value>

  </property>

</configuration>

2、配置环境变量

vi~/.bash_profile

增加代码:

                   exportHIVE_HOME=/bigdata/hive

export PATH=$HIVE_HOME/bin:$PATH

export HIVE_CONF=$HIVE_HOME/conf

export HCAT_HOME=$HIVE_HOME/hcatalog

source~/.bash_profile

3、启动hive

创建logs文件夹:mkdir /bigdata/hive/logs

启动hive:

        nohup hive --service metastore> /bigdata/hive/logs/metastore.log 2>&1 &

        nohup hive --servicehiveserver2 > /bigdata/hive/logs/hiveserver2.log 2>&1 &

4、执行hive测试

操作和mysql操作基本相同。

猜你喜欢

转载自blog.csdn.net/qq_27366893/article/details/80052935
今日推荐