Elasticsearch中的doc是咋回事

doc 存在的背景：

ES的inverted indices结构，使得查找包含某个term的文档的操作十分方便和高效。

例如某个索引下的倒排索引结构如下：

Term      Doc_1   Doc_2   Doc_3
------------------------------------
brown   |   X   |   X   |
dog     |   X   |       |   X
dogs    |       |   X   |   X
fox     |   X   |       |   X
foxes   |       |   X   |
in      |       |   X   |
jumped  |   X   |       |   X
lazy    |   X   |   X   |
leap    |       |   X   |
over    |   X   |   X   |   X
quick   |   X   |   X   |   X
summer  |       |   X   |
the     |   X   |       |   X
------------------------------------

某个查询语句如下：

GET /my_index/_search
{
  "query" : { #（1）
    "match" : {
      "body" : "brown"
    }
  },
  "aggs" : { #（2）
    "popular_terms": {
      "terms" : {
        "field" : "body"
      }
    }
  }
}

我们知道(1)的query在inverted indices的情况下是简单而高效的。

但是，对于(2)的aggregation的操作确实什么困难的。因为你要针对每个doc遍历一遍，看看它包括哪些term。

While the inverted index maps terms to the documents containing the term, doc values maps documents to the terms contained by the document:

Doc      Terms
-----------------------------------------------------------------
Doc_1 | brown, dog, fox, jumped, lazy, over, quick, the
Doc_2 | brown, dogs, foxes, in, lazy, leap, over, quick, summer
Doc_3 | dog, dogs, fox, jumped, over, quick, the
-----------------------------------------------------------------

doc values使用的是uninverted indices的结构，想要找每个文档具体包括哪些term就很容易了。

doc values适合index:"not_analyzed"的字段，对于analyzed的字段不适合。

doc values可以用于aggregations/sorts/scripts。

更多详情，可见：https://www.elastic.co/guide/en/elasticsearch/guide/current/docvalues.html

Elasticsearch中的doc是咋回事

猜你喜欢