代码:
import jieba.analyse sentence = "我爱北京天安门" # 抽取关键词 keywords = jieba.analyse.extract_tags(sentence, topK=20, withWeight=False, allowPOS=()) # Extract keywords from sentence using TF-IDF algorithm. # Parameter: # - topK: return how many top keywords. `None` for all possible words. # - withWeight: if True, return a list of (word, weight); # if False, return a list of words. # - allowPOS: the allowed POS list eg. ['ns', 'n', 'vn', 'v','nr']. # if the POS of w is not in this list,it will be filtered. # - withFlag: only work with allowPOS is not empty. # if True, return a list of pair(word, weight) like posseg.cut # if False, return a list of words print("===" * 20) print(keywords) # 带权重 keywords = jieba.analyse.extract_tags(sentence, topK=20, withWeight=True, allowPOS=()) print("===" * 20) for tup in keywords: print("%s %.4f"%tup)
运行结果:
============================================================ ['天安门', '北京'] ============================================================ 天安门 4.4977 北京 2.3337
代码:
# 实例化 Tfidf = jieba.analyse.TFIDF() keywords = Tfidf.extract_tags(sentence, topK=20, withWeight=False, allowPOS=()) print("===" * 20) print(keywords) # 带权重 keywords2 = Tfidf.extract_tags(sentence, topK=20, withWeight=True, allowPOS=()) print("===" * 20) for tup in keywords2: print("%s %.4f"%tup)
运行结果:
============================================================ ['天安门', '北京'] ============================================================ 天安门 4.4977 北京 2.3337
代码:
# 将天安门的idf设置很低 # 载入模块 Tfidf.set_idf_path('idf.txt') keywords3 = Tfidf.extract_tags(sentence, topK=20, withWeight=False, allowPOS=()) print("===" * 20) print(keywords3) # 带权重 keywords3 = Tfidf.extract_tags(sentence, topK=20, withWeight=True, allowPOS=()) print("===" * 20) for tup in keywords3: print("%s %.4f"%tup)
运行结果:
============================================================ ['北京', '天安门'] ============================================================ 北京 0.0500 天安门 0.0500
代码:
# 用textrank对文章的关键词进行提取 keywords = jieba.analyse.textrank(sentence, topK=20, withWeight=False, allowPOS=('ns','n','vn','v')) print("===" * 20) print(keywords) # 带权重 keywords = jieba.analyse.textrank(sentence, topK=20, withWeight=True, allowPOS=('ns','n','vn','v')) print("===" * 20) for tup in keywords: print("%s %.4f"%tup)
运行结果:
============================================================ ['天安门', '北京'] ============================================================ 天安门 1.0000 北京 0.9961
代码:
# 实例化 Textrank = jieba.analyse.TextRank() keywords = Textrank.textrank(sentence) print(keywords)
运行结果:
['天安门', '北京']