ElasticSearch的评分计算

ElasticSearch 搜素时会带有一个 _score 的数据,表示搜索出来的结果与参数之间的相关性

本文内容

  1. ElasticSearch 的三大评分原则
  2. ElasticSearch 了解为什么这样评分
  3. ElasticSearch 的文本评判基础算法

版本关系

  1. 5.0 以前使用 TF/IDF 算法
  2. 5.0 以后使用 BM25 算法

ElasticSearch 的评分原则

  1. 检索词频率
    • 该词出现的频率越大,评分越高
  2. 反向文档频率
    • 该词如果在该索引(表)中出现的频率很高,评分会随之降低
    • 也就是说,假如索引一共有 10 个数据,9个都含有 hello ,评分会降低
  3. 字段长度准则
    1. 字符长度越长,相关性越低,也就是短文本中匹配到比长文本匹配到分更高

ElasticSearch 了解为什么这样评分

  1. 在 API 后面加上 explain=true即可查看
      {
        "_shard" : "[us][1]",
        "_node" : "uJzJEIZuR2mmGO4sGkhyAg",
        "_index" : "us",
        "_type" : "tweet",
        "_id" : "12",
        "_score" : 0.1671281,
        "_source" : {
          "date" : "2014-09-22",
          "name" : "John Smith",
          "tweet" : "Elasticsearch and I have left the honeymoon stage, and I still love her.",
          "user_id" : 1
        },
        "_explanation" : {
          "value" : 0.16712809,
          "description" : "weight(tweet:elasticsearch in 2) [PerFieldSimilarity], result of:",
          "details" : [
            {
              "value" : 0.16712809,
              "description" : "score(doc=2,freq=1.0 = termFreq=1.0\n), product of:",
              "details" : [
                {
                  "value" : 0.18232156,
                  "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                  "details" : [
                    {
                      "value" : 2.0,
                      "description" : "docFreq",
                      "details" : [ ]
                    },
                    {
                      "value" : 2.0,
                      "description" : "docCount",
                      "details" : [ ]
                    }
                  ]
                },
                {
                  "value" : 0.9166666,
                  "description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "termFreq=1.0",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.2,
                      "description" : "parameter k1",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.75,
                      "description" : "parameter b",
                      "details" : [ ]
                    },
                    {
                      "value" : 9.0,
                      "description" : "avgFieldLength",
                      "details" : [ ]
                    },
                    {
                      "value" : 11.0,
                      "description" : "fieldLength",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      },

ElasticSearch 的文本评判基础算法、

  1. 官网介绍1
  2. 官网介绍 2
  3. 官网介绍 3
  4. 中文介绍

猜你喜欢

转载自blog.csdn.net/weixin_42290927/article/details/107657717