doc 存在的背景:
ES的inverted indices结构,使得查找包含某个term的文档的操作十分方便和高效。
例如 某个索引下的倒排索引结构如下:
Term Doc_1 Doc_2 Doc_3 ------------------------------------ brown | X | X | dog | X | | X dogs | | X | X fox | X | | X foxes | | X | in | | X | jumped | X | | X lazy | X | X | leap | | X | over | X | X | X quick | X | X | X summer | | X | the | X | | X ------------------------------------
某个查询语句如下:
GET /my_index/_search { "query" : { #(1) "match" : { "body" : "brown" } }, "aggs" : { #(2) "popular_terms": { "terms" : { "field" : "body" } } } }
我们知道(1)的query在inverted indices的情况下是简单而高效的。
但是, 对于(2)的aggregation的操作确实什么困难的。因为你要针对每个doc遍历一遍, 看看它包括哪些term。
While the inverted index maps terms to the documents containing the term, doc values maps documents to the terms contained by the document:
Doc Terms ----------------------------------------------------------------- Doc_1 | brown, dog, fox, jumped, lazy, over, quick, the Doc_2 | brown, dogs, foxes, in, lazy, leap, over, quick, summer Doc_3 | dog, dogs, fox, jumped, over, quick, the -----------------------------------------------------------------
doc values使用的是uninverted indices的结构, 想要找每个文档具体包括哪些term就很容易了。
doc values适合index:"not_analyzed"的字段, 对于analyzed的字段不适合。
doc values可以用于aggregations/sorts/scripts。
更多详情,可见:https://www.elastic.co/guide/en/elasticsearch/guide/current/docvalues.html