ElasticSearch是个是一个分布式、可扩展、实时的搜索与数据分析引擎,如何将海量数据源高效可靠的写入到ElasticSearch是个无法避免的
Logstash概念与原理
Logstash 是开源的服务器端数据处理管道,能够同时从多个来源动态地采集、转换和传输数据到ElasticSearch的索引中,进而对数据进行分词、检索与分析,不受格式或复杂度的影响,它提供了丰富的过滤器库,如能利用 Grok 从非结构化数据中派生出结构,从 IP 地址解码出地理坐标,匿名化或排除敏感字段,并简化整体处理过程
Logstash应用场景
1、Logstash直接作为客户端数据源收集器,对数据进行解析转换和存储(Logstash较为重量级,消耗资源较多)
2、通过Beats收集客户端数据,Logstash对Beats的数据进行进一步收集、分析和转换
3、订阅Kaka消息,对数据进行解析、转换
解决方案:
1、数据源(MySQL数据,)——Logstash——输出(输出到ElasticSearch、文件、kafka、Redis…)
2、数据源——Beats(如FileBeats)——Logstash——输出
3、数据源——Beats——Kafka(Redis)——Logstash——输出
4、Kafia(Redis)——Logstash——输出
Logstash实现kafka消息订阅、解析与ElasticSearch存储
Logstash实现FileBeat数据收集、清洗与ElasticSearch存储
Logstash实现MySQL数据收集、解析与ElasticSearch存储
Logstash的过滤器插件库
Plugin |
Description |
Github repository |
Aggregates information from several events originating with a single task |
||
Performs general alterations to fields that the |
||
Parses string representations of computer storage sizes, such as "123 MB" or "5.6gb", into their numeric value in bytes |
||
Checks IP addresses against a list of network blocks |
||
Applies or removes a cipher to an event |
||
Duplicates events |
||
Parses comma-separated value data into individual fields |
||
Parses dates from fields to use as the Logstash timestamp for an event |
||
Computationally expensive filter that removes dots from a field name |
||
Extracts unstructured event data into fields using delimiters |
||
Performs a standard or reverse DNS lookup |
||
Drops all events |
||
Calculates the elapsed time between a pair of events |
||
Copies fields from previous log events in Elasticsearch to current events |
||
Stores environment variables as metadata sub-fields |
||
Extracts numbers from a string |
||
Fingerprints fields by replacing values with a consistent hash |
||
Adds geographical information about an IP address |
||
Parses unstructured event data into fields |
||
Provides integration with external web services/REST APIs |
||
Removes special characters from a field |
||
Generates a UUID and adds it to each processed event |
||
Enriches events with data pre-loaded from a remote database |
||
Enrich events with your database data |
||
Parses JSON events |
||
Serializes a field to JSON |
||
Parses key-value pairs |
||
Provides integration with external data in Memcached |
||
Takes complex events containing a number of metrics and splits these up into multiple events, each holding a single metric |
||
Aggregates metrics |
||
Performs mutations on fields |
||
Prunes event data based on a list of fields to blacklist or whitelist |
||
Checks that specified fields stay within given size or length limits |
||
Executes arbitrary Ruby code |
||
Sleeps for a specified time span |
||
Splits multi-line messages into distinct events |
||
Parses the |
||
Enriches security logs with information about the attacker’s intent |
||
Throttles the number of events |
||
Replaces the contents of the default message field with whatever you specify in the configuration |
||
Replaces field contents based on a hash or YAML file |
||
Truncates fields longer than a given length |
||
Decodes URL-encoded fields |
||
Parses user agent strings into fields |
||
Adds a UUID to events |
||
Parses XML into fields |
grok,能通过正则解析和结构化任何文本,Grok 目前是Logstash最好的方式对非结构化日志数据解析成结构化和可查询化。此外,Logstash还可以重命名、删除、替换和修改事件字段,当然也包括完全丢弃事件,如debug事件。还有很多的复杂功能可供选择,
Flume侧重数据的传输,使用者需非常清楚整个数据的路由,相对来说其更可靠,channel是用于持久化目的的,数据必须确认传输到下一个目的地,才会删除;
Logstash侧重数据的预处理,日志字段经过预处理之后再进行解析