六、flume-ng-1.5.0-cdh5.3.6安装

①什么是flume？

flume简单来说，是离线的日志收集的工具。可以把离线收集的数据传输到hdfs上去，最传统的流程是，上传到hdfs以后，然后去跑一个mapreduce程序，再对离线的日志信息进行数据的清洗。清洗过后，我们通常把清洗过后的数据加载到hive里面去。做出一个hive的原始的一个表，然后后面走大量的hive的ETL，也就是会走大量的hive的sql。然后配置各种各样的调度。大量的ETL会对原始的数据进行各种各样的处理，然后搭建出基于hive的一套数据仓库。然后后面的spark、hive、mr才能基于这个数据仓库进行各种各样的计算。

②安装flume

1、利用WinSCP软件将文件flume-ng-1.5.0-cdh5.3.6.tar.gz拷贝到虚拟机sparkproject1上的/usr/local/目录里下。

2、对文件flume-ng-1.5.0-cdh5.3.6.tar.gz进行解压，输入：cd /usr/local/，进入local目录。

然后输入：tar -zxvf flume-ng-1.5.0-cdh5.3.6.tar.gz

在local目录下，删除这个 flume-ng-1.5.0-cdh5.3.6.tar.gz文件，输入：rm -rf flume-ng-1.5.0-cdh5.3.6.tar.gz

3、在local目录下，输入：ll，发现有apache-flume-1.5.0-cdh5.3.6-bin文件夹，重命名这个文件夹。

在local目录下输入：mv apache-flume-1.5.0-cdh5.3.6-bin flume

再在local目录下输入：ll，发现apache-flume-1.5.0-cdh5.3.6-bin文件夹没有了，取而代之的是flume文件夹。

4、配置flume相关的环境变量，在local目录下输入：vi ~/.bashrc

然后输入键盘 i 键，更改内容：

插入内容：export FLUME_HOME=/usr/local/flume

扫描二维码关注公众号，回复： 8485614 查看本文章

export FLUME_CONF_DIR=$FLUME_HOME/conf

export PATH=$PATH:......:$FLUME_HOME/bin

最后内容是：

export JAVA_HOME=/usr/java/latest

export HADOOP_HOME=/usr/local/hadoop

export HIVE_HOME=/usr/local/hive

export ZOOKEEPER_HOME=/usr/local/zk

export SCALA_HOME=/usr/local/scala

export FLUME_HOME=/usr/local/flume

export FLUME_CONF_DIR=$FLUME_HOME/conf

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:$FLUME_HOME/bin

然后按Esc键，输入:wq退出保存。

5、source一下环境变量：在local目录下输入：source ~/.bashrc

③修改flume配置文件

1、在local目录下输入：cd flume/，可以看到conf文件夹

再在flume目录下输入：cd conf/，可以看到flume-conf.properties.template文件，这是一个模板文件。

将文件flume-conf.properties.template更改名字为flume-conf.properties，在conf目录下输入：

mv flume-conf.properties.template flume-conf.properties

然后在conf目录下输入：ll，检查是否更改名字成功。

2、在conf目录下输入：vi flume-conf.properties

再输入键盘 i 键，更改内容：

将：agent.sources = seqGenSrc

agent.channels = memoryChannel

agent.sinks = loggerSink

全部删除，改成下面的内容：

agent1.sources = source1

agent1.sinks = sink1

agent1.channels = channel1

3、找到flume-conf.properties文件中的以下这段内容：

#For each one of the sources ,the type is defined

agent.sources.seqGenSrc.type = seq

#The channel can be defined as follows

agent.sources.seqGenSrc.channels = memoryChannel

把后面的三句：

agent.sources.seqGenSrc.type = seq

#The channel can be defined as follows

agent.sources.seqGenSrc.channels = memoryChannel

去除掉，只留下： #For each one of the sources ,the type is defined，这一句话。

然后在： #For each one of the sources ,the type is defined，这句话的下面写：

agent1.sources.source1.type=spooldir

agent1.sources.source1.spoolDir=/usr/local/logs

agent1.sources.source1.channels=channel1

agent1.sources.source1.fileHeader = false

agent1.sources.source1.interceptors = i1

agent1.sources.source1.interceptors.i1.type = timestamp

4、找到flume-conf.properties文件中的以下这段内容：

#Each channel's type is defined

agent.channel's.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)

# Can be defined as well

# In this case , it specifies the capacity of the memory channel

agent.Channels.memoryChannel.capacity = 100

去掉后面五段，只留下：

#Each channel's type is defined

最后为：

#Each channel's type is defined

agent1.channels.channel1.type=file

agent1.channels.channel1.checkpointDir=/usr/local/logs_tmp_cp

agent1.channels.channel1.dataDirs=/usr/local/logs_tmp

5、找到flume-conf.properties文件中的以下这段内容：

#Each sink's type must be defined

agent.sinks.loggerSink.type = logger

#Specify the channel the sink should use

agent.sinks.loggerSink.channel = memoryChannel

去掉后面三段，只留下：#Each sink's type must be defined

最后为：

#Each sink's type must be defined

agent1.sinks.sink1.type=hdfs

agent1.sinks.sink1.hdfs.path=hdfs://sparkproject1:9000/logs

agent1.sinks.sink1.hdfs.fileType=DataStream

agent1.sinks.sink1.hdfs.writeFormat=TEXT

agent1.sinks.sink1.hdfs.rollInterval=1

agent1.sinks.sink1.channel=channel1

agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d

6、按Esc键，输入:wq，保存退出。

④创建需要的文件夹

1、创建本地文件夹：mkdir /usr/local/logs

在sparkproject1虚拟机上的/usr/local/目录下输入：mkdir logs

然后再输入：ll，发现local目录里下有logs文件夹了。

2、创建 HDFS文件夹：hdfs dfs -mkdir /logs

在sparkproject1虚拟机上的/usr/local/目录下输入：hdfs dfs -mkdir /logs

再在local目录下输入：hdfs dfs -ls /，发现有/logs文件夹。

⑤启动flume-agent

在sparkproject1虚拟机的/usr/ocal目录下输入：flume-ng agent -n agent1 -c conf -f /usr/local/flume/conf/flume-conf.properties -Dflume.root.logger=DEBUG,console

然后出现监控状态，这时再打开一个终端窗口：

在虚拟机sparkproject1的另一个终端窗口输入：cd /usr/local/

再输入：vi ids

输入键盘 i 键，更改内容：

按Esc键，输入:wq，保存并退出。

然后输入：mv ids logs

然后发现第一个终端的窗口出现了不一样。会把文件上传到指定的hdfs目录里面去。

在sparkproject1虚拟机的第二个终端窗口的/usr/local/目录输入：hdfs dfs -lsr /logs，发现有一个文件了。

再在local目录输入：hdfs dfs -text /logs/2015-11-18.1447820220026，其中：2015-11-18.1447820220026根据自己情况而定。

详见附件

发布了122 篇原创文章 · 获赞 1 · 访问量 3543

私信关注