一、Query String Search(‘Query String’方式的搜索)
1.搜索全部商品
GET /shop_index/productInfo/_search
返回结果:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "2",
"_score": 1,
"_source": {
"test": "test"
}
},
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "zyWpRGkB8mgaHjxk0Hfo",
"_score": 1,
"_source": {
"name": "HuaWei P20",
"desc": "Expen but easy to use",
"price": 5300,
"producer": "HuaWei Producer",
"tags": [
"Expen",
"Fast"
]
}
},
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "1",
"_score": 1,
"_source": {
"name": "HuaWei Mate8",
"desc": "Cheap and easy to use",
"price": 2500,
"producer": "HuaWei Producer",
"tags": [
"Cheap",
"Fast"
]
}
}
]
}
}
字段解释:
- took:耗费了几毫秒
- timed_out:是否超时,这里是没有
- _shards:数据被拆到了5个分片上,搜索时使用了5个分片,5个分片都成功地返回了数据,失败了0个,跳过了0个
- hits.total:查询结果的数量,3个document
- max_score:就是document对于一个search的相关度的匹配分数,越相关,就越匹配,分数也越高
- hits.hits:包含了匹配搜索的document的详细数据
- _source:数据
2.搜索商品名称中包含HuaWei的商品,而且按照售价降序排序:
下面这种方法也是"Query String Search"的由来,因为search参数都是以http请求的query string来附带的.
GET /shop_index/productInfo/_search?q=name:HuaWei&sort=price:desc
返回结果:
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "zyWpRGkB8mgaHjxk0Hfo",
"_score": null,
"_source": {
"name": "HuaWei P20",
"desc": "Expen but easy to use",
"price": 5300,
"producer": "HuaWei Producer",
"tags": [
"Expen",
"Fast"
]
},
"sort": [
5300
]
},
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "1",
"_score": null,
"_source": {
"name": "HuaWei Mate8",
"desc": "Cheap and easy to use",
"price": 2500,
"producer": "HuaWei Producer",
"tags": [
"Cheap",
"Fast"
]
},
"sort": [
2500
]
}
]
}
}
二、Query DSL(DSL: Domain Specified Language,特定领域的语言)
这种方法是通过一个json格式的http request body请求体作为条件,可以完成多种复杂的查询需求,比query string的功能更加强大
1.match_all查询
搜索所有商品
GET /shop_index/productInfo/_search
{
"query": {
"match_all": {}
}
}
返回结果忽略。
2.全文索引(Full-Text Search)
搜索生产厂商字段中包含"HuaWei MateProducer"的商品记录:
GET /shop_index/productInfo/_search
{
"query": {
"match": {
"producer": "HuaWei MateProducer"
}
}
}
返回结果:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.5753642,
"hits": [
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "SiUBRWkB8mgaHjxkJHyS",
"_score": 0.5753642,
"_source": {
"name": "HuaWei Mate10",
"desc": "Cheap and Beauti",
"price": 2300,
"producer": "HuaWei MateProducer",
"tags": [
"Cheap",
"Beauti"
]
}
},
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "1",
"_score": 0.2876821,
"_source": {
"name": "HuaWei Mate8",
"desc": "Cheap and easy to use",
"price": 2500,
"producer": "HuaWei Producer",
"tags": [
"Cheap",
"Fast"
]
}
},
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "zyWpRGkB8mgaHjxk0Hfo",
"_score": 0.18232156,
"_source": {
"name": "HuaWei P20",
"desc": "Expen but easy to use",
"price": 5300,
"producer": "HuaWei Producer",
"tags": [
"Expen",
"Fast"
]
}
},
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "CSX8RGkB8mgaHjxkV3w1",
"_score": 0.18232156,
"_source": {
"name": "HuaWei nova 4e",
"desc": "cheap and look nice",
"price": 1999,
"producer": "HuaWei Producer",
"tags": [
"Cheap",
"Nice"
]
}
}
]
}
}
从以上结果中可以看到:
id为"SiUBRWkB8mgaHjxkJHyS"的记录score分数最高,表示匹配度最高;
原因:
producer分完词之后包括的词语有:
- HuaWei:
匹配到改词的记录ID:‘SiUBRWkB8mgaHjxkJHyS’,‘1’,‘CSX8RGkB8mgaHjxkV3w1’,‘zyWpRGkB8mgaHjxk0Hfo’ - MateProducer:
匹配到该词的记录ID:‘SiUBRWkB8mgaHjxkJHyS’
由于"HuaWei MateProducer"两次匹配到ID为’SiUBRWkB8mgaHjxkJHyS’的记录,所以该记录的score分数最高。
小功能之降序
查询名称中包含HuaWei的商品,并且按照价格降序排列
GET /shop_index/productInfo/_search
{
"query": {
"match": {
"name": "HuaWei"
}
},
"sort": [
{
"price": {
"order": "desc"
}
}
]
}
返回结果省略…
小功能之分页
分页查询第二页,每页1条记录
GET /shop_index/productInfo/_search
{
"query": {
"match_all": {}
},
"from": 1,
"size": 1
}
小功能之只查询特定字段
比如:name,desc和price字段,其他字段不需要返回
GET /shop_index/productInfo/_search
{
"query": {
"match": {
"name": "HuaWei"
}
},
"_source": ["name","desc","price"]
}
返回结果:
{
"took": 27,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.2876821,
"hits": [
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "zyWpRGkB8mgaHjxk0Hfo",
"_score": 0.2876821,
"_source": {
"price": 5300,
"name": "HuaWei P20",
"desc": "Expen but easy to use"
}
},
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "1",
"_score": 0.2876821,
"_source": {
"price": 2500,
"name": "HuaWei Mate8",
"desc": "Cheap and easy to use"
}
}
]
}
}
3.Phrase Search(短语搜索)
短语索引和全文索引的区别:
- 全文匹配:将要搜索的内容分词,然后挨个单词去倒排索引中匹配,只要匹配到任意一个单词,就算是匹配到记录;
- 短语索引:输入的搜索串,必须在指定的字段内容中,完全包含一模一样的,才可以算匹配,才能作为结果返回;
例如:搜索name中包含"HuaWei MateProducer"短语的商品信息:
GET /shop_index/productInfo/_search
{
"query": {
"match_phrase": {
"producer": "HuaWei MateProducer"
}
}
}
返回结果:
{
"took": 158,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "SiUBRWkB8mgaHjxkJHyS",
"_score": 0.5753642,
"_source": {
"name": "HuaWei Mate10",
"desc": "Cheap and Beauti",
"price": 2300,
"producer": "HuaWei MateProducer",
"tags": [
"Cheap",
"Beauti"
]
}
}
]
}
}
可以看到只有包含"HuaWei MateProducer"的记录才被返回。
4.term查询
term级别查询将按照存储在倒排索引中的确切字词进行操作,这些查询通常用于数字,日期和枚举等结构化数据,而不是全文本字段。搜索前不会再对搜索词进行分词,所以我们的搜索词必须是文档分词集合中的一个。比如说我们要查找年龄为39的所有文档:
POST /bank/_search?pretty
{
"query": {
"term": {
"age": "39"
}
}
}
注意:当用term查询字符串的时候不一定能匹配上。字符串字段可以是文本类型(视为全文,如电子邮件正文)或关键字(视为精确值,如电子邮件地址或邮政编码)。
- 精确值(如数字,日期和关键字)具有在添加到倒排索引的字段中指定的确切值,以使其可被搜索。
- 文本字段,它们的值首先通过一个分析器产生一个项目列表,然后将其添加到倒排索引中。
分析文本的方法有很多种:默认的标准分析器会删除大部分的标点符号,将文本分解为单个的单词,并将其分解为小写字母。
例如,标准分析仪会将字符串“Quick Brown Fox!”变成[quick,brown,fox]
下面做一个测试演示
首先,创建一个索引,指定字段映射,并索引一个文档创建索引和索引数据。
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"full_text": {
"type": "text" 1
},
"exact_value": {
"type": "keyword" 2
}
}
}
}
}
PUT my_index/my_type/1
{
"full_text": "Quick Foxes!", 3
"exact_value": "Quick Foxes!" 4
}
- full_text字段是文本类型,将被分析。
- exact_value字段是关键字类型,不会被分析。
- full_text倒排索引将包含术语:[quick,foxes]。
- exact_value倒排索引将包含确切的术语:[Quick Foxes!]
现在,比较术语查询和匹配查询的结果:
GET my_index/my_type/_search
{
"query": {
"term": {
"exact_value": "Quick Foxes!" 1
}
}
}
GET my_index/my_type/_search
{
"query": {
"term": {
"full_text": "Quick Foxes!" 2
}
}
}
GET my_index/my_type/_search 3
{
"query": {
"term": {
"full_text": "foxes"
}
}
}
GET my_index/my_type/_search 4
{
"query": {
"match": {
"full_text": "Quick Foxes!"
}
}
}
- 此查询匹配,因为exact_value字段包含确切的术语Quick Foxes !.
- 这个查询不匹配,因为full_text字段只包含quick和foxes这两个词。 它不包含确切的术语Quick Foxes !.
- 术语foxes的查询匹配full_text字段。
- full_text字段上的匹配查询首先分析查询字符串,然后查找包含快速或狐狸或两者的文档。
然后看一下分词情况分析:
exact_value:
GET /my_index/_analyze
{
"field": "exact_value",
"text": "Quick Foxes!"
}
结果:
{
"tokens": [
{
"token": "Quick Foxes!",
"start_offset": 0,
"end_offset": 12,
"type": "word",
"position": 0
}
]
}
full_text:
GET /my_index/_analyze
{
"field": "full_text",
"text": "Quick Foxes!"
}
结果:
{
"tokens": [
{
"token": "quick",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "foxes",
"start_offset": 6,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
}
]
}
5.Query Filter(对查询结果进行过滤)
文档中score(_score字段是搜索结果)。score是一个数字型的,是一种相对方法匹配查询文档结果。分数越高,搜索关键字与该文档相关性越高;越低,搜索关键字与该文档相关性越低。
在elasticsearch中所有的搜索都会触发相关性分数计算。如果我们不使用相关性分数计算,那要使用另一种查询能力,构建过滤器。
过滤器是类似于查询的概念,除了得以优化,更快的执行速度的两个主要原因:
- 过滤器不计算得分,所以他们比执行查询的速度快;
- 过滤器可缓存在内存中,允许重复搜索。
比如:查询名称中包含HuaWei,并且价格大于4000的商品记录:
GET /shop_index/productInfo/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "HuaWei"
}
}
],
"filter": {
"range": {
"price": {
"gt": 4000
}
}
}
}
}
}
返回结果:
{
"took": 195,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "zyWpRGkB8mgaHjxk0Hfo",
"_score": 0.2876821,
"_source": {
"name": "HuaWei P20",
"desc": "Expen but easy to use",
"price": 5300,
"producer": "HuaWei Producer",
"tags": [
"Expen",
"Fast"
]
}
}
]
}
}
6.范围查询(Range Query)
将文档与具有一定范围内字词的字段进行匹配。 Lucene查询的类型取决于字段类型,对于字符串字段,TermRangeQuery,对于数字/日期字段,查询是NumericRangeQuery。
以下示例返回年龄在10到20之间的所有文档:
GET /bank/_search
{
"query": {
"range" : {
"age" : {
"gte" : 10,
"lte" : 20,
"boost" : 2.0
}
}
}
}
范围查询接受以下参数:
- gte: 大于或等于
- gt: 大于
- lte: 小于或等于
- lt: 小于
- boost: 设置查询的提升值,默认为1.0
7.布尔查询(组合查询)
must查询
返回 匹配address=mill & address=lane:
must:要求所有条件都要满足(类似于&&)
should查询
返回 匹配address=mill or address=lane:
should:任何一个满足就可以(类似于||)
must_not查询
返回 不匹配address=mill & address=lane:
must_not:所有条件都不能满足(类似于! (&&))
8.聚合查询
待更。。。
参考:https://segmentfault.com/a/1190000018634655
参考:https://www.cnblogs.com/shaosks/p/7813729.html
参考:https://www.cnblogs.com/shaosks/p/7825476.html
参考:https://www.cnblogs.com/shaosks/p/7810046.html