Elasticsearch 深入4 - 代码天地

Elasticsearch 深入4

其他 2019-06-13 11:21:29 阅读次数: 0

将一个field索引两次来解决字符串排序

如果对一个string field进行排序，结果往往不准确，因为分词后是多个单词，再排序就不是我们想要的结果了

通常解决方案是，将一个string field建立两次索引，一个分词，用来进行搜索；一个不分词，用来进行排序

PUT /website
{
    "mappings":{
        "article":{
            "properties":{
                "title":{
                    "type":"text", 第一次索引进行分词
                    "fields":{ 第二次索引不进行分词
                        "raw":{
                            "type":"string",
                            "index":"not_analyzed"
                        }
                    },
                    "fielddata":true 正排索引
                },
                "content":{
                    "type":"text"
                },
                "post_date":{
                    "type":"date"
                },
                "author_id":{
                    "type":"long"
                }
            }
        }
    }
}

GET /website/article/_search
{
    "query":{
        "match_all":{

        }
    },
    "sort":[
        {
            "title.raw":{ //如果直接使用title的话是对分词之后的结果排序可能存在问题 title.raw 使用不分词的索引进行排序
                "order":"desc"
            }
        }
    ]
}

相关度评分TF&IDF算法独家解密

1、算法介绍

relevance score算法，简单来说，就是计算出，一个索引中的文本，与搜索文本，他们之间的关联匹配程度

Elasticsearch使用的是 term frequency/inverse document frequency算法，简称为TF/IDF算法

Term frequency：搜索文本中的各个词条在field文本中出现了多少次，出现次数越多，就越相关

搜索请求：hello world

doc1：hello you, and world is very good
doc2：hello, how are you

Inverse document frequency：搜索文本中的各个词条在整个索引的所有文档中出现了多少次，出现的次数越多，就越不相关

搜索请求：hello world

doc1：hello, today is very good
doc2：hi world, how are you

比如说，在index中有1万条document，hello这个单词在所有的document中，一共出现了1000次；world这个单词在所有的document中，一共出现了100次

doc2更相关

Field-length norm：field长度，field越长，相关度越弱

搜索请求：hello world

doc1：{ "title": "hello article", "content": "babaaba 1万个单词" }
doc2：{ "title": "my article", "content": "blablabala 1万个单词，hi world" }

hello world在整个index中出现的次数是一样多的

doc1更相关，title field更短

GET /people/man/111/_explain

GET /people/man/_search?explain

{
    "query":{
        "match":{
            "name":"ajax"
        }
    }
}

猜你喜欢

转载自www.cnblogs.com/jiahaoJAVA/p/11015026.html

Elasticsearch 深入4

Lucene4基础概念-Elasticsearch深入

ElasticSearch（4）

Elasticsearch存储深入详解

Elasticsearch 深入5

Elasticsearch深入6

Elasticsearch深入7

Elasticsearch深入9

Elasticsearch 缓存深入详解

4、CentOS 安装Elasticsearch

Elasticsearch初探（4）——集群

elasticsearch(4)查询DSL

4、Elasticsearch插件安装

ElasticSearch聚合基础使用《玩转ElasticSearch 4》

《深入理解Elasticsearch》README

elasticsearch 深入 —— Scroll滚动查询

elasticsearch 深入 —— Top Hits Aggregation

Elasticsearch Java API深入详解

Elasticsearch写入原理深入详解

大数据015——Elasticsearch深入

Elasticsearch深入11 运维

Elasticsearch系列---深入全文搜索

深入Elasticsearch度量聚集(1)

深入Elasticsearch度量聚集(2)

三、深入elasticsearch基本语法

Elasticsearch Nested类型深入详解

Elasticsearch5 4 0 head/kibana/logstash 安装部署深入详解

elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解

elasticsearch 索引存储深入详解（Elasticsearch教程03）|MVP讲堂

Elasticsearch 对比传统数据库：深入挖掘 Elasticsearch 的优势

今日推荐

周排行

vue + echart +map中国地图，省市地图，区县地图

spring boot2 (31)-cors跨域请求

『学习资料推荐』299元买的微信营销资料打包

个人学习卷积神经网络的疑惑解答

网络工程师-软考

模拟人生4 春夏秋冬、星梦起飞版更新下载方法以及常见问题

python关于对象的字符串显示str和repr以及

奇怪的session混乱问题

【3】分治法（divide-and-conquer）

Java项目开发成绩管理系统（九）各模块实现信息修改

每日归档

更多

2024-08-07(0)

2024-08-06(0)

2024-08-05(0)

2024-08-04(0)

2024-08-03(0)

2024-08-02(0)

2024-08-01(0)

2024-07-31(0)

2024-07-30(0)

2024-07-29(0)