需求
雪花啤酒 需要搜索雪花、啤酒 、雪花啤酒、xh、pj、xh啤酒、雪花pj
ik导入
参考https://www.cnblogs.com/LQBlog/p/10443862.html,不需要修改源码步骤就行
拼音分词器导入
跟ik一样 下载下来打包移动到es plugins 目录名字改为pinyin
测试
get请求:http://127.0.0.1:9200/_analyze
body:
{ "analyzer":"pinyin", "text":"雪花啤酒" }
响应:
{ "tokens": [ { "token": "xue", "start_offset": 0, "end_offset": 0, "type": "word", "position": 0 }, { "token": "xhpj", "start_offset": 0, "end_offset": 0, "type": "word", "position": 0 }, { "token": "hua", "start_offset": 0, "end_offset": 0, "type": "word", "position": 1 }, { "token": "pi", "start_offset": 0, "end_offset": 0, "type": "word", "position": 2 }, { "token": "jiu", "start_offset": 0, "end_offset": 0, "type": "word", "position": 3 } ] }
说明导入成功
测试中文加拼音搜索
自定义mapping和自定义分词器
put请求:http://127.0.0.1:9200/opcm3
body:
{ "settings": { "analysis": { "analyzer": { "ik_pinyin_analyzer": {//自定义一个分词器名字叫ik_pinyin_analyzer "type":"custom",//表示自定义分词器 "tokenizer": "ik_smart",//使用ik分词 ik_smart为粗粒度分词 ik_max_word为最细粒度分词 "filter": ["my_pinyin"]//分词后结果 交给过滤器再次分词 } }, "filter": { "my_pinyin": {//定义一个过滤器分词 内部使用pinyin "type": "pinyin" } } } }, "mappings" : {//自定义映射 "topic" : {//type "properties" : { "productName": {//属性 "type": "text", "analyzer": "ik_pinyin_analyzer"//使用自定义分词 } } } } }
filter个人理解
我的理解是 ik分词 然后将分词后的逐项结果通过filter交给拼音分词 雪花啤酒 ik会分成 雪花,啤酒 然后雪花交给pinyin会分词 xue,hua,xh 啤酒会分词 pi,jiu,pj
测试
put请求:http://127.0.0.1:9200/opcm3/topic/1
body:
{ "productName":"雪花啤酒" }
查看这条数据分词结果
get请求:http://127.0.0.1:9200/opcm3/topic/1/_termvectors?fields=productName
结果:
{ "_index": "opcm3", "_type": "topic", "_id": "1", "_version": 1, "found": true, "took": 40, "term_vectors": { "productName": { "field_statistics": { "sum_doc_freq": 6, "doc_count": 1, "sum_ttf": 6 }, "terms": { "hua": { "term_freq": 1, "tokens": [ { "position": 1, "start_offset": 0, "end_offset": 2 } ] }, "jiu": { "term_freq": 1, "tokens": [ { "position": 3, "start_offset": 2, "end_offset": 4 } ] }, "pi": { "term_freq": 1, "tokens": [ { "position": 2, "start_offset": 2, "end_offset": 4 } ] }, "pj": { "term_freq": 1, "tokens": [ { "position": 3, "start_offset": 2, "end_offset": 4 } ] }, "xh": { "term_freq": 1, "tokens": [ { "position": 0, "start_offset": 0, "end_offset": 2 } ] }, "xue": { "term_freq": 1, "tokens": [ { "position": 0, "start_offset": 0, "end_offset": 2 } ] } } } } }
get请求:http://127.0.0.1:9200/opcm3/topic/_search
{ "query":{ "match":{ "productName":"雪花啤" } } }
这个时候我们搜索xh啤酒 雪花pj xh 等 都能搜索到数据