再次理解ElasticSearch

Overview

A module in Filebeat is a way to parse a specific log file format for a particular software.
Pipeline

A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared.

A pipeline consists of two main fields:
- a description :
  
  The description is a special field to store a helpful description of what the pipeline does
- a list of processors：
  
  The processors parameter defines a list of processors to be executed in order
```
{
  "description"	:	"...",
  "processors"	:	[...]
}
```
Painless scripting language

Painless Guide

Painless Language Specification

理解ANTLR4 & ASM

理解inline script || stored script

Painless is a simple, secure scripting language designed specifically for use with Elasticsearch.

It is the default scripting language for Elasticsearch and can safely be used for inline and stored scripts.
Mapping

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.

A mapping definition has:
- Metadata fields
  
  Metadata fields are used to customize how a document’s asspciated metadata is treated.
  
  Examples of metadata fields include the document’s _index, _id, and _source fields.
- Fields
  
  A mapping contains a list of fields or properties pertinent to the document.
  
  Each field has its own data type.
Defining too many fields in an index can lead to a mapping explosion, which can cause out of memory errors and difficult situations to recover from.

There are two way to implement mapping:
- Dynamic mapping
  
  One of the most important features of Elasticsearch is that it tries to get out of your way and let you start exploring your data as quickly as possible.
  
  To index a document, you don’t have to first create an index, define a mapping type, and define your fields - you can just index a document and the index, type , and fields will spring to life automatically.
  
  The automatic detection and addition of new fields is called dynamic mapping：
  - dynamic field mapping
  - dynamic templates
- Explicit mapping
data replication model

Each index in Elasticsearch is divided into shards and each shard can have multiple copies.

These copies are known as a replication group and must be kept in sync when documents are added or removed.

The process of keeping the shard copies in sync and serving reads from them is what we call the data replication model.

This model is based on having a single copy from the replication group that acts as the primary shard, the other copies are called replica shards.
Ingest node

The built-in modules are almost entirely using the Ingest node feature of Elasticsearch instead of the Beats processors.

One of the most helpful parts of the ingest pipeline is the ability to debug by using the Simulate Pipeline API.

The simulate pipeline API executes a specific pipeline against a set of documents provided in the body of the request.

You can either specify an existing pipeline to execute against the provided documents or supply a pipeline definition in the body of the request.

You can use the simulate pipeline API to see how each processor affects the ingest document as it passes through the pipeline. To see the intermediate results of each processor in the simulate request, you can add the verbose parameter to the request.

The Ingest pipeline works on a document level, you still need to check for exceptions where the logs are generated and let Filebeat create a single message out of that.
Suricata fields

理解suricata.eve.timestamp
References

Monitor Java App

再次理解ElasticSearch

Overview

Pipeline

Painless scripting language

Mapping

data replication model

Ingest node

Suricata fields

References

猜你喜欢