版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/gwd1154978352/article/details/83343933
ElasticSearch汇总请查看:ElasticSearch教程——汇总篇
分词器
Elasticsearch中,内置了很多分词器(analyzers),例如standard
(标准分词器)、english
(英文分词)和chinese
(中文分词),默认的是standard,
standard tokenizer:以单词边界进行切分
standard token filter:什么都不做
lowercase token filter:将所有字母转换为小写
stop token filer(默认被禁用):移除停用词,比如a the it等等
修改分词器设置
启用english,停用词token filter
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"es_std": {
"type": "standard",
"stopwords": "_english_"
}
}
}
}
}
标准分词测试代码
GET /my_index/_analyze
{
"analyzer": "standard",
"text": "a dog is in the house"
}
结果
{
"tokens": [
{
"token": "a",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "dog",
"start_offset": 2,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "is",
"start_offset": 6,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "in",
"start_offset": 9,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "the",
"start_offset": 12,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "house",
"start_offset": 16,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 5
}
]
}
设置的英文分词测试代码
GET /my_index/_analyze
{
"analyzer": "es_std",
"text":"a dog is in the house"
}
结果
{
"tokens": [
{
"token": "dog",
"start_offset": 2,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "house",
"start_offset": 16,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 5
}
]
}
自定义分词器
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": {
"&_to_and": {
"type": "mapping",
"mappings": ["&=> and"]
}
},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": ["the", "a"]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": ["html_strip", "&_to_and"],
"tokenizer": "standard",
"filter": ["lowercase", "my_stopwords"]
}
}
}
}
}
内容解析
测试代码
GET /my_index/_analyze
{
"text": "tom&jerry are a friend in the house, <a>, HAHA!!",
"analyzer": "my_analyzer"
}
测试结果
{
"tokens": [
{
"token": "tomandjerry",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "are",
"start_offset": 10,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "friend",
"start_offset": 16,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "in",
"start_offset": 23,
"end_offset": 25,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "house",
"start_offset": 30,
"end_offset": 35,
"type": "<ALPHANUM>",
"position": 6
},
{
"token": "HAHA",
"start_offset": 42,
"end_offset": 46,
"type": "<ALPHANUM>",
"position": 7
}
]
}
type中的使用
PUT /my_index/_mapping/my_type
{
"properties": {
"content": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}