接上文:【翻译】Flume 1.8.0 User Guide(用户指南)
3.6.11 Sequence Generator Source
一个简单的序列生成器,它使用计数器连续生成事件,计数器从0开始,递增1,并在totalEvents停止。无法将事件发送到channel时重试。主要用于测试。在重试期间,它保持重试消息的主体与以前相同,这样,在目的地重复数据删除之后,唯一事件的数量预期将等于指定的totalEvents。所需属性以粗体显示。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be seq |
selector.type | replicating or multiplexing | |
selector.* | replicating | Depends on the selector.type value |
interceptors | – | Space-separated list of interceptors |
interceptors.* | ||
batchSize | 1 | Number of events to attempt to process per request loop. |
totalEvents | Long.MAX_VALUE | Number of unique events sent by the source. |
agent a1 示例:
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = seq a1.sources.r1.channels = c1
3.6.12 Syslog Sources
读取syslog数据并生成Flume事件。UDP源将整个消息视为单个事件。TCP源为用换行符(' n ')分隔的每个字符串创建一个新事件。
所需属性以粗体显示。
3.6.12.1 Syslog TCP Source
原始的、可靠的syslog TCP源.
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be syslogtcp |
host | – | Host name or IP address to bind to |
port | – | Port # to bind to |
eventSize | 2500 | Maximum size of a single event line, in bytes |
keepFields | none | Setting this to ‘all’ will preserve the Priority, Timestamp and Hostname in the body of the event. A spaced separated list of fields to include is allowed as well. Currently, the following fields can be included: priority, version, timestamp, hostname. The values ‘true’ and ‘false’ have been deprecated in favor of ‘all’ and ‘none’. |
selector.type | replicating or multiplexing | |
selector.* | replicating | Depends on the selector.type value |
interceptors | – | Space-separated list of interceptors |
interceptors.* |
例如,agent a1 syslog TCP source
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = syslogtcp a1.sources.r1.port = 5140 a1.sources.r1.host = localhost a1.sources.r1.channels = c1
3.6.12.2 Multiport Syslog TCP Source
这是一个更新、更快、支持多端口的Syslog TCP源版本。请注意,端口配置设置已经替换了端口。多端口功能意味着它可以以一种有效的方式同时监听多个端口。这个源代码使用Apache Mina库来实现这一点。提供对RFC-3164和许多常见的RFC-5424格式消息的支持。还提供按端口配置所使用的字符集的功能。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be multiport_syslogtcp |
host | – | Host name or IP address to bind to. |
ports | – | Space-separated list (one or more) of ports to bind to. |
eventSize | 2500 | Maximum size of a single event line, in bytes. |
keepFields | none | Setting this to ‘all’ will preserve the Priority, Timestamp and Hostname in the body of the event. A spaced separated list of fields to include is allowed as well. Currently, the following fields can be included: priority, version, timestamp, hostname. The values ‘true’ and ‘false’ have been deprecated in favor of ‘all’ and ‘none’. |
portHeader | – | If specified, the port number will be stored in the header of each event using the header name specified here. This allows for interceptors and channel selectors to customize routing logic based on the incoming port. |
charset.default | UTF-8 | Default character set used while parsing syslog events into strings. |
charset.port.<port> | – | Character set is configurable on a per-port basis. |
batchSize | 100 | Maximum number of events to attempt to process per request loop. Using the default is usually fine. |
readBufferSize | 1024 | Size of the internal Mina read buffer. Provided for performance tuning. Using the default is usually fine. |
numProcessors | (auto-detected) | Number of processors available on the system for use while processing messages. Default is to auto-detect # of CPUs using the Java Runtime API. Mina will spawn 2 request-processing threads per detected CPU, which is often reasonable. |
selector.type | replicating | replicating, multiplexing, or custom |
selector.* | – | Depends on the selector.type value |
interceptors | – | Space-separated list of interceptors. |
interceptors.* |
For example, a multiport syslog TCP source for agent named a1:
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = multiport_syslogtcp a1.sources.r1.channels = c1 a1.sources.r1.host = 0.0.0.0 a1.sources.r1.ports = 10001 10002 10003 a1.sources.r1.portHeader = port
3.6.12.3 Syslog UDP Source
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be syslogudp |
host | – | Host name or IP address to bind to |
port | – | Port # to bind to |
keepFields | false | Setting this to true will preserve the Priority, Timestamp and Hostname in the body of the event. |
selector.type | replicating or multiplexing | |
selector.* | replicating | Depends on the selector.type value |
interceptors | – | Space-separated list of interceptors |
interceptors.* |
For example, a syslog UDP source for agent named a1:
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = syslogudp a1.sources.r1.port = 5140 a1.sources.r1.host = localhost a1.sources.r1.channels = c1
3.6.13 HTTP source
通过HTTP POST和GET接受Flume事件的源。GET应该只用于实验。HTTP请求通过必须实现HTTPSourceHandler接口的可插入“处理程序”转换为flume事件。这个处理程序接受HttpServletRequest并返回一个flume事件列表。从一个Http请求处理的所有事件都提交给一个事务中的通道,从而提高了文件通道等通道的效率。如果处理程序抛出异常,该源将返回400的HTTP状态。如果channel已满,或者源无法向channel追加事件,源将返回一个HTTP 503—暂时不可用状态。
在一个post请求中发送的所有事件都被视为一个批处理,并在一个事务中插入到channel中。
Property Name | Default | Description |
---|---|---|
type | The component type name, needs to be http | |
port | – | The port the source should bind to. |
bind | 0.0.0.0 | The hostname or IP address to listen on |
handler | org.apache.flume.source.http.JSONHandler | The FQCN of the handler class. |
handler.* | – | Config parameters for the handler |
selector.type | replicating | replicating or multiplexing |
selector.* | Depends on the selector.type value | |
interceptors | – | Space-separated list of interceptors |
interceptors.* | ||
enableSSL | false | Set the property true, to enable SSL. HTTP Source does not support SSLv3. |
excludeProtocols | SSLv3 | Space-separated list of SSL/TLS protocols to exclude. SSLv3 is always excluded. |
keystore | Location of the keystore includng keystore file name | |
keystorePassword Keystore password |
例如, a http source for agent named a1:
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = http a1.sources.r1.port = 5140 a1.sources.r1.channels = c1 a1.sources.r1.handler = org.example.rest.RestHandler a1.sources.r1.handler.nickname = random props
3.6.13.1 JSONHandler
提供了一个开箱即用的处理程序,它可以处理JSON格式表示的事件,并支持UTF-8、UTF-16和UTF-32字符集。处理程序接受事件数组(即使只有一个事件,也必须以数组的形式发送事件),并根据请求中指定的编码将其转换为Flume事件。如果没有指定编码,则假定为UTF-8。JSON处理程序支持UTF-8、UTF-16和UTF-32。事件表示如下:
[{ "headers" : { "timestamp" : "434324343", "host" : "random_host.example.com" }, "body" : "random_body" }, { "headers" : { "namenode" : "namenode.example.com", "datanode" : "random_datanode.example.com" }, "body" : "really_random_body" }]
要设置字符集,请求必须具有指定为application/json的内容类型;charset=UTF-8(根据需要将UTF-8替换为UTF-16或UTF-32)。
按照此处理程序所期望的格式创建事件的一种方法是使用Flume SDK中提供的JSONEvent,并使用谷歌Gson使用Gson#fromJson(对象、类型)方法创建JSON字符串。传递给事件列表的方法的第二个参数的类型令牌可以通过以下方式创建:
Type type = new TypeToken<List<JSONEvent>>() {}.getType();
3.6.13.2 BlobHandler
默认情况下,HTTPSource将JSON输入拆分为Flume事件。另一种选择是,BlobHandler是HTTPSource的处理程序,它返回一个事件,该事件包含请求参数以及随此请求上载的二进制大对象(BLOB)。例如PDF或JPG文件。注意,这种方法不适用于非常大的对象,因为它在RAM中缓冲整个BLOB。
Property Name | Default | Description |
---|---|---|
handler | – | The FQCN of this class: org.apache.flume.sink.solr.morphline.BlobHandler |
handler.maxBlobLength | 100000000 | The maximum number of bytes to read and buffer for a given request |
3.6.24 Stress Source
----------未完待续-------------