Hadoop(八)flume配置

实时日志收集系统Flume 分为 Flume NG 和Flume OG,Flume NG是Flume OG的进化版,更简单,更小,更易于部署,Flume NG不一定会向后兼容,所以如果是刚入门的话,最好使用Flume NG。

Flume的一些核心概念

  • Event 事件是可由Flume NG运输的单个数据单元。事件类似于JMS和类似消息传递系统中的消息,并且通常很小(大约几个字节到几千字节)。事件通常也是大数据集中的单个记录。一个事件由头和身体组成; 前者是键/值映射,后者是任意字节数组
  • Source Flume NG从中接收数据的数据源。源可以是可轮询的或事件驱动的,(source从Client收集数据,传递给Channel)
  • Sink 接收器 它是Flume NG中数据的目的地,Sink从Channel中取出事件,然后将数据发到别处,可以向文件系统、数据库、 hadoop存数据
  • Channel Source和sink之间的中转站,当发送比接收快的时候,可以在channel中缓冲
  • Agent 一个独立的Flume进程,包含组件Source、 Channel、 Sink。(Agent使用JVM 运行Flume。每台机器运行一个agent,但是可以在一个agent中包含多个sources和sinks)
  • Client 生产数据,运行在一个独立的线程。

下载地址 http://flume.apache.org/download.html

下载完成之后上传到linux服务器并解压

tar -zxvf apache-flume-1.8.0-bin.tar.gz

进入到解压目录conf目录中,里面给了一个模板flume-conf.properties.template复制一份并编辑

cp flume-conf.properties.template flume.conf
vim flume.conf  

Flume使用基于Java属性文件的配置格式。需要我们在运行代理时告诉Flume通过选项使用哪个文件

一个基本的小例子,把下面的复制道flume.conf中


# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory

# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414

# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = logger

# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1

这个例子创建了一个 memory channel ,Avro RPC源和logger sink并将它们连接在一起
Avro源接收到的任何事件都被路由到通道ch1并传送到记录器接收器

启动flume

bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent1

–conf 代表配置的路径
-f 指定配置文件,这个配置文件必须在全局选项的–conf参数定义的目录下
-Dflume 设置log的输出级别
-n Agent的名字

可以看到日志

2018-06-08 15:06:08,636 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:141)] Polling sink runner starting
2018-06-08 15:06:09,246 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SOURCE, name: avro-source1: Successfully registered new MBean.
2018-06-08 15:06:09,246 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SOURCE, name: avro-source1 started
2018-06-08 15:06:09,262 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:260)] Avro source avro-source1 started.
2018-06-08 15:06:38,635 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:127)] Checking file:conf/flume.conf for changes
2018-06-08 15:07:08,637 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:127)] Checking file:conf/flume.conf for changes

日志中可以看到 Avro source avro-source1 started 已经启动,并隔一段时间去检查配置文件 Checking file:conf/flume.conf for changes

做个小例子,下面为每个Linux用户创建一个事件,并将其发送到本地主机上的Flume的avro源代码:41414。

在新窗口中输入以下内容:

bin/flume-ng avro-client --conf conf -H localhost -p 41414 -F /etc/passwd -Dflume.root.logger=DEBUG,console

改窗口中可以看到以下日志

2018-06-08 15:20:04,243 (main) [WARN - org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:634)] Using default maxIOWorkers
2018-06-08 15:20:04,680 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:233)] Finished
2018-06-08 15:20:04,680 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:236)] Closing reader
2018-06-08 15:20:04,681 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:240)] Closing RPC client
2018-06-08 15:20:04,689 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:84)] Exiting

老窗口中可以看到Event的打印日志

2018-06-08 15:20:06,800 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 72 6F 6F 74 3A 78 3A 30 3A 30 3A 72 6F 6F 74 3A root:x:0:0:root: }
2018-06-08 15:20:06,801 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 62 69 6E 3A 78 3A 31 3A 31 3A 62 69 6E 3A 2F 62 bin:x:1:1:bin:/b }
2018-06-08 15:20:06,801 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 64 61 65 6D 6F 6E 3A 78 3A 32 3A 32 3A 64 61 65 daemon:x:2:2:dae }
2018-06-08 15:20:06,801 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 61 64 6D 3A 78 3A 33 3A 34 3A 61 64 6D 3A 2F 76 adm:x:3:4:adm:/v }
2018-06-08 15:20:06,801 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 6C 70 3A 78 3A 34 3A 37 3A 6C 70 3A 2F 76 61 72 lp:x:4:7:lp:/var }
2018-06-08 15:20:06,802 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 73 79 6E 63 3A 78 3A 35 3A 30 3A 73 79 6E 63 3A sync:x:5:0:sync: }

到这里 恭喜!你有Apache Flume运行成功

猜你喜欢

转载自blog.csdn.net/mingyunxiaohai/article/details/80585280