Logstash7.4实现Kafka消息、Beats、MySQL的数据收集、解析、转换和ElasticSearch存储的应用场景

ElasticSearch是个是一个分布式、可扩展、实时的搜索与数据分析引擎，如何将海量数据源高效可靠的写入到ElasticSearch是个无法避免的

Logstash概念与原理

Logstash 是开源的服务器端数据处理管道，能够同时从多个来源动态地采集、转换和传输数据到ElasticSearch的索引中，进而对数据进行分词、检索与分析，不受格式或复杂度的影响，它提供了丰富的过滤器库，如能利用 Grok 从非结构化数据中派生出结构，从 IP 地址解码出地理坐标，匿名化或排除敏感字段，并简化整体处理过程

Logstash应用场景

1、Logstash直接作为客户端数据源收集器，对数据进行解析转换和存储（Logstash较为重量级，消耗资源较多）

2、通过Beats收集客户端数据，Logstash对Beats的数据进行进一步收集、分析和转换

3、订阅Kaka消息，对数据进行解析、转换

解决方案：

1、数据源（MySQL数据，）——Logstash——输出（输出到ElasticSearch、文件、kafka、Redis…）

2、数据源——Beats(如FileBeats)——Logstash——输出

3、数据源——Beats——Kafka(Redis)——Logstash——输出

4、Kafia(Redis)——Logstash——输出

Logstash实现kafka消息订阅、解析与ElasticSearch存储

Logstash实现FileBeat数据收集、清洗与ElasticSearch存储

Logstash实现MySQL数据收集、解析与ElasticSearch存储

Logstash的过滤器插件库

Plugin	Description	Github repository
aggregate	Aggregates information from several events originating with a single task	logstash-filter-aggregate
alter	Performs general alterations to fields that the `mutate` filter does not handle	logstash-filter-alter
bytes	Parses string representations of computer storage sizes, such as "123 MB" or "5.6gb", into their numeric value in bytes	logstash-filter-bytes
cidr	Checks IP addresses against a list of network blocks	logstash-filter-cidr
cipher	Applies or removes a cipher to an event	logstash-filter-cipher
clone	Duplicates events	logstash-filter-clone
csv	Parses comma-separated value data into individual fields	logstash-filter-csv
date	Parses dates from fields to use as the Logstash timestamp for an event	logstash-filter-date
de_dot	Computationally expensive filter that removes dots from a field name	logstash-filter-de_dot
dissect	Extracts unstructured event data into fields using delimiters	logstash-filter-dissect
dns	Performs a standard or reverse DNS lookup	logstash-filter-dns
drop	Drops all events	logstash-filter-drop
elapsed	Calculates the elapsed time between a pair of events	logstash-filter-elapsed
elasticsearch	Copies fields from previous log events in Elasticsearch to current events	logstash-filter-elasticsearch
environment	Stores environment variables as metadata sub-fields	logstash-filter-environment
extractnumbers	Extracts numbers from a string	logstash-filter-extractnumbers
fingerprint	Fingerprints fields by replacing values with a consistent hash	logstash-filter-fingerprint
geoip	Adds geographical information about an IP address	logstash-filter-geoip
grok	Parses unstructured event data into fields	logstash-filter-grok
http	Provides integration with external web services/REST APIs	logstash-filter-http
i18n	Removes special characters from a field	logstash-filter-i18n
java_uuid	Generates a UUID and adds it to each processed event	core plugin
jdbc_static	Enriches events with data pre-loaded from a remote database	logstash-filter-jdbc_static
jdbc_streaming	Enrich events with your database data	logstash-filter-jdbc_streaming
json	Parses JSON events	logstash-filter-json
json_encode	Serializes a field to JSON	logstash-filter-json_encode
kv	Parses key-value pairs	logstash-filter-kv
memcached	Provides integration with external data in Memcached	logstash-filter-memcached
metricize	Takes complex events containing a number of metrics and splits these up into multiple events, each holding a single metric	logstash-filter-metricize
metrics	Aggregates metrics	logstash-filter-metrics
mutate	Performs mutations on fields	logstash-filter-mutate
prune	Prunes event data based on a list of fields to blacklist or whitelist	logstash-filter-prune
range	Checks that specified fields stay within given size or length limits	logstash-filter-range
ruby	Executes arbitrary Ruby code	logstash-filter-ruby
sleep	Sleeps for a specified time span	logstash-filter-sleep
split	Splits multi-line messages into distinct events	logstash-filter-split
syslog_pri	Parses the `PRI` (priority) field of a `syslog` message	logstash-filter-syslog_pri
threats_classifier	Enriches security logs with information about the attacker’s intent	logstash-filter-threats_classifier
throttle	Throttles the number of events	logstash-filter-throttle
tld	Replaces the contents of the default message field with whatever you specify in the configuration	logstash-filter-tld
translate	Replaces field contents based on a hash or YAML file	logstash-filter-translate
truncate	Truncates fields longer than a given length	logstash-filter-truncate
urldecode	Decodes URL-encoded fields	logstash-filter-urldecode
useragent	Parses user agent strings into fields	logstash-filter-useragent
uuid	Adds a UUID to events	logstash-filter-uuid
xml	Parses XML into fields	logstash-filter-xml

grok，能通过正则解析和结构化任何文本，Grok 目前是Logstash最好的方式对非结构化日志数据解析成结构化和可查询化。此外，Logstash还可以重命名、删除、替换和修改事件字段，当然也包括完全丢弃事件，如debug事件。还有很多的复杂功能可供选择，

Flume侧重数据的传输，使用者需非常清楚整个数据的路由，相对来说其更可靠，channel是用于持久化目的的，数据必须确认传输到下一个目的地，才会删除；

Logstash侧重数据的预处理，日志字段经过预处理之后再进行解析

Jack__iT

发布了117 篇原创文章 · 获赞 17 · 访问量 8万+

私信关注

Logstash7.4实现Kafka消息、Beats、MySQL的数据收集、解析、转换和ElasticSearch存储的应用场景

猜你喜欢