自定义添加【带空格的词】,并分词识别
方法:找到源码的变量进行修改
示例:使【Blade Master】这类中间有空格
的词被识别
jieba
import jieba, re
sentence = 'Blade Master疾风刺杀Archmage'
jieba.add_word('Blade Master') # 添词
print([word for word in jieba.cut(sentence)])
jieba.re_han_default = re.compile('(.+)', re.U) # 修改格式
print([word for word in jieba.cut(sentence)])
- 打印结果
-
[‘Blade’, ’ ', ‘Master’, ‘疾风’, ‘刺杀’, ‘Archmage’]
[‘Blade Master’, ‘疾风’, ‘刺杀’, ‘Archmage’]
jieb.posseg
import jieba, jieba.posseg as jp, re
sentence = 'Demon Hunter斩杀大法师'
jieba.add_word('Demon Hunter', 9, 'hero') # 添词
print(jp.lcut(sentence))
jp.re_han_internal = re.compile('(.+)', re.U) # 修改格式
print(jp.lcut(sentence))
- 打印结果
-
[pair(‘Demon’, ‘eng’), pair(’ ', ‘x’), pair(‘Hunter’, ‘eng’), pair(‘斩杀’, ‘v’), pair(‘大法师’, ‘n’)]
[pair(‘Demon Hunter’, ‘x’), pair(‘斩杀’, ‘v’), pair(‘大法师’, ‘n’)]