eslasticsearch多种常用搜索方式

一、Query String Search(‘Query String’方式的搜索)

1.搜索全部商品

GET /shop_index/productInfo/_search

返回结果：

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test": "test"
        }
      },
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "zyWpRGkB8mgaHjxk0Hfo",
        "_score": 1,
        "_source": {
          "name": "HuaWei P20",
          "desc": "Expen but easy to use",
          "price": 5300,
          "producer": "HuaWei Producer",
          "tags": [
            "Expen",
            "Fast"
          ]
        }
      },
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "HuaWei Mate8",
          "desc": "Cheap and easy to use",
          "price": 2500,
          "producer": "HuaWei Producer",
          "tags": [
            "Cheap",
            "Fast"
          ]
        }
      }
    ]
  }
}

字段解释：

took:耗费了几毫秒
timed_out:是否超时，这里是没有
_shards:数据被拆到了5个分片上，搜索时使用了5个分片，5个分片都成功地返回了数据，失败了0个，跳过了0个
hits.total:查询结果的数量，3个document
max_score:就是document对于一个search的相关度的匹配分数，越相关，就越匹配，分数也越高
hits.hits:包含了匹配搜索的document的详细数据
_source:数据

2.搜索商品名称中包含HuaWei的商品，而且按照售价降序排序：

下面这种方法也是"Query String Search"的由来，因为search参数都是以http请求的query string来附带的.

GET /shop_index/productInfo/_search?q=name:HuaWei&sort=price:desc

返回结果：

{
  "took": 23,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": null,
    "hits": [
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "zyWpRGkB8mgaHjxk0Hfo",
        "_score": null,
        "_source": {
          "name": "HuaWei P20",
          "desc": "Expen but easy to use",
          "price": 5300,
          "producer": "HuaWei Producer",
          "tags": [
            "Expen",
            "Fast"
          ]
        },
        "sort": [
          5300
        ]
      },
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "1",
        "_score": null,
        "_source": {
          "name": "HuaWei Mate8",
          "desc": "Cheap and easy to use",
          "price": 2500,
          "producer": "HuaWei Producer",
          "tags": [
            "Cheap",
            "Fast"
          ]
        },
        "sort": [
          2500
        ]
      }
    ]
  }
}

二、Query DSL(DSL: Domain Specified Language，特定领域的语言)

这种方法是通过一个json格式的http request body请求体作为条件，可以完成多种复杂的查询需求，比query string的功能更加强大

1.match_all查询

搜索所有商品

GET /shop_index/productInfo/_search
{
  "query": {
    "match_all": {}
  }
}

返回结果忽略。

2.全文索引(Full-Text Search)

搜索生产厂商字段中包含"HuaWei MateProducer"的商品记录：

GET /shop_index/productInfo/_search
{
  "query": {
    "match": {
      "producer": "HuaWei MateProducer"
    }
  }
}

返回结果：

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "SiUBRWkB8mgaHjxkJHyS",
        "_score": 0.5753642,
        "_source": {
          "name": "HuaWei Mate10",
          "desc": "Cheap and Beauti",
          "price": 2300,
          "producer": "HuaWei MateProducer",
          "tags": [
            "Cheap",
            "Beauti"
          ]
        }
      },
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "HuaWei Mate8",
          "desc": "Cheap and easy to use",
          "price": 2500,
          "producer": "HuaWei Producer",
          "tags": [
            "Cheap",
            "Fast"
          ]
        }
      },
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "zyWpRGkB8mgaHjxk0Hfo",
        "_score": 0.18232156,
        "_source": {
          "name": "HuaWei P20",
          "desc": "Expen but easy to use",
          "price": 5300,
          "producer": "HuaWei Producer",
          "tags": [
            "Expen",
            "Fast"
          ]
        }
      },
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "CSX8RGkB8mgaHjxkV3w1",
        "_score": 0.18232156,
        "_source": {
          "name": "HuaWei nova 4e",
          "desc": "cheap and look nice",
          "price": 1999,
          "producer": "HuaWei Producer",
          "tags": [
            "Cheap",
            "Nice"
          ]
        }
      }
    ]
  }
}

从以上结果中可以看到：
id为"SiUBRWkB8mgaHjxkJHyS"的记录score分数最高，表示匹配度最高;
原因：
producer分完词之后包括的词语有：

HuaWei:
匹配到改词的记录ID：‘SiUBRWkB8mgaHjxkJHyS’,‘1’,‘CSX8RGkB8mgaHjxkV3w1’,‘zyWpRGkB8mgaHjxk0Hfo’
MateProducer:
匹配到该词的记录ID：‘SiUBRWkB8mgaHjxkJHyS’
由于"HuaWei MateProducer"两次匹配到ID为’SiUBRWkB8mgaHjxkJHyS’的记录，所以该记录的score分数最高。

小功能之降序

查询名称中包含HuaWei的商品，并且按照价格降序排列

GET /shop_index/productInfo/_search
{
  "query": {
    "match": {
      "name": "HuaWei"
    }
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

返回结果省略…

小功能之分页

分页查询第二页，每页1条记录

GET /shop_index/productInfo/_search
{
  "query": {
    "match_all": {}
  },
  "from": 1,
  "size": 1
}

小功能之只查询特定字段

比如：name，desc和price字段，其他字段不需要返回

GET /shop_index/productInfo/_search
{
  "query": {
    "match": {
      "name": "HuaWei"
    }
  },
  "_source": ["name","desc","price"]
}

返回结果：

{
  "took": 27,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "zyWpRGkB8mgaHjxk0Hfo",
        "_score": 0.2876821,
        "_source": {
          "price": 5300,
          "name": "HuaWei P20",
          "desc": "Expen but easy to use"
        }
      },
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "price": 2500,
          "name": "HuaWei Mate8",
          "desc": "Cheap and easy to use"
        }
      }
    ]
  }
}

3.Phrase Search(短语搜索)

短语索引和全文索引的区别：

全文匹配：将要搜索的内容分词，然后挨个单词去倒排索引中匹配，只要匹配到任意一个单词，就算是匹配到记录;
短语索引：输入的搜索串，必须在指定的字段内容中，完全包含一模一样的，才可以算匹配，才能作为结果返回;

例如：搜索name中包含"HuaWei MateProducer"短语的商品信息:

GET /shop_index/productInfo/_search
{
  "query": {
    "match_phrase": {
      "producer": "HuaWei MateProducer"
    }
  }
}

返回结果：

{
  "took": 158,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "SiUBRWkB8mgaHjxkJHyS",
        "_score": 0.5753642,
        "_source": {
          "name": "HuaWei Mate10",
          "desc": "Cheap and Beauti",
          "price": 2300,
          "producer": "HuaWei MateProducer",
          "tags": [
            "Cheap",
            "Beauti"
          ]
        }
      }
    ]
  }
}

可以看到只有包含"HuaWei MateProducer"的记录才被返回。

4.term查询

term级别查询将按照存储在倒排索引中的确切字词进行操作，这些查询通常用于数字，日期和枚举等结构化数据，而不是全文本字段。搜索前不会再对搜索词进行分词，所以我们的搜索词必须是文档分词集合中的一个。比如说我们要查找年龄为39的所有文档：

POST /bank/_search?pretty
{
  "query": {
    "term": {
      "age": "39"
    }
  }
}

注意：当用term查询字符串的时候不一定能匹配上。字符串字段可以是文本类型（视为全文，如电子邮件正文）或关键字（视为精确值，如电子邮件地址或邮政编码）。

精确值（如数字，日期和关键字）具有在添加到倒排索引的字段中指定的确切值，以使其可被搜索。
文本字段，它们的值首先通过一个分析器产生一个项目列表，然后将其添加到倒排索引中。

分析文本的方法有很多种：默认的标准分析器会删除大部分的标点符号，将文本分解为单个的单词，并将其分解为小写字母。
例如，标准分析仪会将字符串“Quick Brown Fox！”变成[quick，brown，fox]

下面做一个测试演示

首先，创建一个索引，指定字段映射，并索引一个文档创建索引和索引数据。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "full_text": {
          "type":  "text" 　　　　　　 1
        },
        "exact_value": {
          "type":  "keyword" 　　　　 2
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "full_text":   "Quick Foxes!", 　　3
  "exact_value": "Quick Foxes!"  　　4
}

full_text字段是文本类型，将被分析。
exact_value字段是关键字类型，不会被分析。
full_text倒排索引将包含术语：[quick，foxes]。
exact_value倒排索引将包含确切的术语：[Quick Foxes！]

现在，比较术语查询和匹配查询的结果：

GET my_index/my_type/_search
{
  "query": {
    "term": {
      "exact_value": "Quick Foxes!" 　　1
    }
  }
}

GET my_index/my_type/_search
{
  "query": {
    "term": {
      "full_text": "Quick Foxes!" 　　 2
    }
  }
}

GET my_index/my_type/_search　　　　　　3
{
  "query": {
    "term": {
      "full_text": "foxes" 
    }
  }
}

GET my_index/my_type/_search          4
{
  "query": {
    "match": {
      "full_text": "Quick Foxes!" 
    }
  }
}

此查询匹配，因为exact_value字段包含确切的术语Quick Foxes !.
这个查询不匹配，因为full_text字段只包含quick和foxes这两个词。它不包含确切的术语Quick Foxes !.
术语foxes的查询匹配full_text字段。
full_text字段上的匹配查询首先分析查询字符串，然后查找包含快速或狐狸或两者的文档。

然后看一下分词情况分析：
exact_value:

GET /my_index/_analyze
{
  "field": "exact_value",
  "text": "Quick Foxes!" 
}

结果：

{
  "tokens": [
    {
      "token": "Quick Foxes!",
      "start_offset": 0,
      "end_offset": 12,
      "type": "word",
      "position": 0
    }
  ]
}

full_text:

GET /my_index/_analyze
{
  "field": "full_text",
  "text": "Quick Foxes!" 
}

结果：

{
  "tokens": [
    {
      "token": "quick",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "foxes",
      "start_offset": 6,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

5.Query Filter(对查询结果进行过滤)

文档中score(_score字段是搜索结果)。score是一个数字型的，是一种相对方法匹配查询文档结果。分数越高，搜索关键字与该文档相关性越高；越低，搜索关键字与该文档相关性越低。
在elasticsearch中所有的搜索都会触发相关性分数计算。如果我们不使用相关性分数计算，那要使用另一种查询能力，构建过滤器。
过滤器是类似于查询的概念,除了得以优化,更快的执行速度的两个主要原因:

过滤器不计算得分，所以他们比执行查询的速度快；
过滤器可缓存在内存中，允许重复搜索。
比如：查询名称中包含HuaWei，并且价格大于4000的商品记录:

GET /shop_index/productInfo/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "HuaWei"
          }
        }
      ], 
      "filter": {
        "range": {
          "price": {
            "gt": 4000
          }
        }
      }
    }
  }
}

返回结果：

{
  "took": 195,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "shop_index",
        "_type": "productInfo",
        "_id": "zyWpRGkB8mgaHjxk0Hfo",
        "_score": 0.2876821,
        "_source": {
          "name": "HuaWei P20",
          "desc": "Expen but easy to use",
          "price": 5300,
          "producer": "HuaWei Producer",
          "tags": [
            "Expen",
            "Fast"
          ]
        }
      }
    ]
  }
}

6.范围查询（Range Query）

将文档与具有一定范围内字词的字段进行匹配。 Lucene查询的类型取决于字段类型，对于字符串字段，TermRangeQuery，对于数字/日期字段，查询是NumericRangeQuery。
以下示例返回年龄在10到20之间的所有文档：

GET /bank/_search
{
    "query": {
        "range" : {
            "age" : {
                "gte" : 10,
                "lte" : 20,
                "boost" : 2.0
            }
        }
    }
}