基于朴素贝叶斯的情感分析

在上一张中我们简单的分析了一下朴素贝叶斯的原理和简单应用，我们提到了它主要使用在文本分析，邮件拦截，情感分析等等场景，这里我们就简单的做一个情感分析的处理，判断对京东上某一个商品的喜好。

1. 数据搜集
这里的数据是我们从京东上爬的某种商品的好评和差评，这里就不详细讲解爬取的过程了。
好评：

差评：

2. 数据处理
由上节我们知道，一个完整的文本不能直接拿来训练，所以在训练之前，我们需要将自己的语句分词，构建词向量，所以我们这里需要先进行分词处理，这里我选择的是结巴分词。

分词

# 创建停用词列表
def stopwordslist(filepath):
    stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()]
    
    return stopwords

# 对句子进行分词
def wordCut(sentence):
    words = jieba.cut(sentence.strip())		#分词
    stopwords = stopwordslist('C:\\Users\\John\\Desktop\\emotion Analysis\\stopKeyWords.txt')  # 这里加载停用词的路径，根据自己的路径选择
    outstr = []
    for word in words:
        if word not in stopwords:		#词语不在停用词列表里
            if word != '\t':
                outstr.append(word)
                
    return outstr				#返回分词去停之后的词语列表

构造词语列表

def DataHandle(filename, flag):
    out = []
    lines = pd.read_table(filename, header=None, encoding='utf-8', names=['评论'])	#打开评论文件
    for line in lines['评论']:
        line = str(line)
        outStr = wordCut(line)  		#分词
        out.append(outStr)

    if flag:
        vec = [1] * lines.shape[0]		#好评，创建一个全为1的列表
    else:
        vec = [0] * lines.shape[0]		#差评，创建一个全为0的列表

    return vec, out				#返回单词列表out, 所属类别vec

糅合好评与差评

goodVec, goodList = DataHandle(googDataPath, 1)
badVec, badList = DataHandle(badDataPath, 0)

listClasses = goodVec + badVec
listOPosts = goodList + badList

在这里插入图片描述

构造词表
获取训练集中所有不重复的词语构成列表

   myVocabList = bayes.createVocabList(listOPosts)

3. 向量化

    for postinDoc in listOPosts:
        trainMat.append(bayes.setOfWords2Vec(myVocabList, postinDoc))

4. 模型训练

    p0V, p1V, pAb = bayes.trainNB0(array(trainMat), array(listClasses))

5. 语料测试

    inputS = input(u'请输入您对本商品的评价：')

    testEntry = wordCut(inputS)
    print(testEntry)
    thisDoc = array(bayes.setOfWords2Vec(myVocabList, testEntry))
    print('评价', bayes.classifyNB(thisDoc, p0V, p1V, pAb))

在这里插入图片描述

5. 分析
从上面的结果来看，我们勉强可以达到分析出大家对这个物品的喜好，但是效果还可以继续优化，不同的分词，不同的向量化方法都会对结果产生不同的影响，这些在我们就在后面慢慢分析。
6. 代码分析
基于朴素贝叶斯的情感分析

基于朴素贝叶斯的情感分析

猜你喜欢