Fielddata

When you sort on a field, Elasticsearch needs access to the value of that field for every document that matches the query. The inverted index, which performs very well when searching, is not the ideal structure for sorting on field values:

When searching, we need to be able to map a term to a list of documents.
When sorting, we need to map a document to its terms. In other words, we need to “uninvert” the inverted index.

To make sorting efficient, Elasticsearch loads all the values for the field that you want to sort on into memory. This is referred to as fielddata.

当按照某个字段排序的时候，Elasticsearch需要访问匹配查询的每一个文档的该字段的值。反转索引查询的时候效率很高，但是不适合把字段值排序。

当查询的时候，我们需要把查询词映射到一个文档列表。
当排序的时候，我们需要把一个文档映射到它的词。换句话说，我们需要把反转索引正过来。（这句话怎么理解？反转索引实际上是以词为中心，说明一个词包含在哪些文档中，查询的时候是找词，文档什么结构并不管。排序的时候，是给文档排序。先要找出每一个文档，再找到每个文档的排序字段的值，再按照该值排序，在把排序结果应用于文档，这个过程实际上跟数据库是类似的）

为了排序效率，Elasticsearch会把排序字段的所有词加载到内存中。这就是fielddata。（应该不只是排序字段的值，应该还有该值与文档之间的映射）

警告：

Elasticsearch doesn’t just load the values for the documents that matched a particular query. It loads the values from every document in your index, regardless of the document type.

Elasticsearch 不是加载匹配当前查询的文档的排序字段的值。它会加载索引中所有文档的排序字段的值，而不考虑文档的类型。

The reason that Elasticsearch loads all values into memory is that uninverting the index from disk is slow. Even though you may need the values for only a few docs for the current request, you will probably need access to the values for other docs on the next request, so it makes sense to load all the values into memory at once, and to keep them there.

Elasticsearch加载所有排序字段的值是因为把反转索引从磁盘顺过来慢。即使当前请求只需要很少的文档，但不久你可能需要访问其它的排序字段值，所以加载所有索引字段值到内存中，并把它们保存在那儿是有道理的。

Fielddata is used in several places in Elasticsearch:

Sorting on a field
Aggregations on a field
Certain filters (for example, geolocation filters)
Scripts that refer to fields

Clearly, this can consume a lot of memory, especially for high-cardinality string fields—string fields that have many unique values—like the body of an email. Fortunately, insufficient memory is a problem that can be solved by horizontal scaling, by adding more nodes to your cluster.

For now, all you need to know is what fielddata is, and to be aware that it can be memory hungry. Later, we will show you how to determine the amount of memory that fielddata is using, how to limit the amount of memory that is available to it, and how to preload fielddata to improve the user experience.

猜你喜欢