【机器学习课程-华盛顿大学】:1 案例研究 1.3 分类(2)亚马逊产品评价分类

1、导入库和数据

import graphlab
graphlab.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 4)

products = graphlab.SFrame('amazon_baby.gl/')
products.head()


2、建立word_count矢量

products['word_count'] = graphlab.text_analytics.count_words(products['review'])
products.head()


3、检查最受欢迎商品之一的Giraffe评价

giraffe_reviews = products[products['name'] == 'Vulli Sophie the Giraffe Teether']


4、将评价分为positive和negtive

# ignore all 3* reviews
products = products[products['rating'] != 3]
# positive sentiment = 4* or 5* reviews
products['sentiment'] = products['rating'] >=4
products.head()


5、切分数据集、训练、评估模型

train_data,test_data = products.random_split(.8, seed=0)
sentiment_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=['word_count'],
                                                     validation_set=test_data)
sentiment_model.evaluate(test_data, metric='roc_curve')


6、根据模型预测Giraffe商品

giraffe_reviews['predicted_sentiment'] = sentiment_model.predict(giraffe_reviews, output_type='probability')
giraffe_reviews.head()


7、测试

要查看word_count中,最常用词的排序,使用如下代码:

products['word_count'].show()



the and to i a it this is for my of 


猜你喜欢

转载自blog.csdn.net/weixin_41770169/article/details/80801121