Elasticsearch核心技术与实战学习笔记 55 第二部分总结回顾

一序

本文属于极客时间Elasticsearch核心技术与实战学习笔记系列。

二搜索与算分

结构化搜索与⾮结构化搜索
- Term 查询(不分词)和基于全⽂本 Match （text类型会做分词，keyword不会做分词，转成term查询）搜索的区别
- 对于需要做精确匹配的字段，需要做聚合分析的字段，字段类型设置为 Keyword
Query Context v.s Filter Context
- Filter Context 可以避免算分，并且利⽤缓存（性能优于query）
- Bool 查询中 Filter 和 Must Not 都属于 Filter Context

2.1 搜索与算分

搜索的算分
- TF-IDF / 字段 Boosting（调解es最终的算分）
单字符串多字段查询：multi-match
- Best_Field（单个字段算分最高的结果） / Most_Fields（算分结果相加） / Cross_Field（）
提⾼搜索的相关性
- 多语⾔：设置⼦字段和不同的分词器提升搜索的效果
- Search Template 分离代码逻辑和搜索 DSL
- 多测试，监控及分析⽤户的搜索语句和搜索效果（建议持续监控搜索语句的效果，进行优化）

2.2回顾总结：聚合 / 分⻚

聚合
- Bucket / Metric / Pipeline
分⻚
- From & Size（使用最多，要避免深度分页） / Search After / Scroll API
- 要避免深度分⻚，对于数据导出等操作，可以使⽤ Scroll API

2.3 回顾总结：Elasticsearch 的分布式模型

⽂档的分布式存储
- ⽂档通过 hash 算法， route 并存储到相应的分⽚（也就是主分片不能改）
分⽚及其内部的⼯作机制
- Segment / Transaction Log / Refresh / Merge
分布式查询和聚合分析的内部机制
- Query Then Fetch；IDF 不是基于全局，⽽是基于分⽚计算，因此，数据量少的时候，算分不准
- 增加 “shard_size” 可以提⾼ Terms 聚合的精准度

2.4 回顾总结：数据建模及重要性

数据建模
- ES 如何处理管理关系 / 数据建模的常⻅步骤 / 建模的最佳实践
建模相关的⼯具
- Index Template / Dynamic Template（字段映射） / Ingest Node（数据预处理） / Update By Query （数据重建）/ Reindex / Index Alias（提高可维护性）
最佳实践
- 避免过多的字段 / 避免 wildcard 查询（*开头的通配符查询） / 在 Mapping 中设置合适的字段（看官方文档）

demo:

DELETE test
#es做索引也会使用分词器：把内容分成小写
PUT test/_doc/1
{
  "content":"Hello World"
}

#match 会分词 
POST test/_search
{
  "profile": "true",
  "query": {
    "match": {
      "content": "Hello World"
    }
  }
}
#match 会分词 
POST test/_search
{
  "profile": "true",
  "query": {
    "match": {
      "content": "hello world"
    }
  }
}
#match对.keyword转换为term 
POST test/_search
{
  "profile": "true",
  "query": {
    "match": {
      "content.keyword": "Hello World"
    }
  }
}
#.keyword 不会做分词处理，查不到
POST test/_search
{
  "profile": "true",
  "query": {
    "match": {
      "content.keyword": "hello world"
    }
  }
}
#term不做分词，数据在content上是小写的分词，查不到
POST test/_search
{
  "profile": "true",
  "query": {
    "term": {
      "content": "Hello World"
    }
  }
}
#term不做分词，查不到
POST test/_search
{
  "profile": "true",
  "query": {
    "term": {
      "content": "hello world"
    }
  }
}
#keyword，能查到。
POST test/_search
{
  "profile": "true",
  "query": {
    "term": {
      "content.keyword": "Hello World"
    }
  }
}

测试

判断题：⽣产环境中，对索引使⽤ Index Alias 是⼀个好的实践（正确，）
在 Terms 聚合分析中，有哪些⽅法可以提⾼查询的精准度（数据量小1，主分片1。数据量大：设置shardsie）
如何通过聚合分析知道，每天⽹站中的访客来⾃多少不同的 IP（Cardinality ）
请描述 “multi_match” 查询中 “best_field”的⾏为（单个字段算分最高的结果）
对搜索结果分⻚时，所采⽤的两个参数（from,size）
判断题：使⽤ Scroll API 导出数据时，即使中途有新的数据写⼊，这些数据也能被导出(错，搜索结果有快照，形成快照后新插入数据不会输出)

Elasticsearch核心技术与实战学习笔记 55 第二部分总结回顾

一 序

二 搜索与算分

2.1 搜索与算分

2.2回顾总结：聚合 / 分⻚

2.3 回顾总结：Elasticsearch 的分布式模型

2.4 回顾总结：数据建模及重要性

demo:

测试

猜你喜欢

一序

二搜索与算分