sklearn之svm-葡萄酒质量预测(10)

下面我们接着以葡萄酒质量预测为例，设置svm参数，提高分类准确率。

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Fri Aug 14 16:13:29 2018
svm-葡萄酒质量预测
@author: [email protected]
@blog:https://blog.csdn.net/myhaspl
"""
--------------------- 
import pandas as pd
import numpy as np
from sklearn import svm

print "准备数据"
testDf=pd.read_csv("winequality-white-test.csv",sep=";")
testData=testDf.values
wineDf=pd.read_csv("winequality-white.csv",sep=";")
wineData=wineDf.values

dataColName=wineDf.columns
rsColName=dataColName[-1]
ftColName=list(dataColName[:len(dataColName)-1])

testFeature=testDf[ftColName].values
testResult=testDf[rsColName].values
wineFeature=wineDf[ftColName].values[:100]
wineResult=wineDf[rsColName].values[:100]


print "建立模型"
clf = svm.SVC(gamma='scale',kernel='poly',C=0.8,degree=3)
print "训练模型"
clf.fit(wineFeature,wineResult)
print "测试结果"
y_pred=clf.predict(testFeature)
print y_pred
print testResult
predWine=np.equal(y_pred,testResult)
correctCount=float(sum(map(lambda x: 1 if x else 0, predWine)))
print "正确样本数:%d,总测试样本数:%d,正确率:%g"%(correctCount,len(testResult),correctCount/len(testResult))

正确率进一步提高
准备数据
建立模型
训练模型
测试结果
[6 5 6 6 6 6 7 5 7 5 4 4 4 4 7 5 8 6 5 5 7 5 6 6 5 5 5 5 5 5 5 4 6 6 6 6 5
6 5 5 5 5 5 5 5 5 5 5 5 6 6 4 5 5 5 4]
[6 6 5 6 6 5 7 5 8 5 6 5 5 6 8 5 7 7 5 5 6 6 5 6 5 6 6 6 5 6 6 5 7 7 7 6 6
7 4 6 5 5 5 5 5 6 5 6 6 5 6 5 5 5 5 4]
正确样本数:24,总测试样本数:56,正确率:0.428571
需要注意的是：SVM最适合于小样本的数据，本例中只取了前面100条数据，取的样本数越大，训练模型的速度会越慢，具体讲解见https://blog.csdn.net/myhaspl/article/details/83049188。这个葡萄酒质量预测的例子在阿里云机器学习PAI的随机森林算法（见本博前面博文https://blog.csdn.net/myhaspl/article/details/82958302）中达到了82%，神经网络（非卷积深度网络）也只达到了58%。

把样本数据设置为1000时，准确率进一步提高

准备数据
建立模型
训练模型
测试结果
[5 5 5 6 6 6 7 5 6 5 6 6 6 5 6 5 7 7 6 5 7 5 6 6 5 5 5 5 5 5 5 5 7 7 7 5 5
 6 5 5 5 6 5 5 5 5 6 5 5 6 5 5 5 5 5 5]
[6 6 5 6 6 5 7 5 8 5 6 5 5 6 8 5 7 7 5 5 6 6 5 6 5 6 6 6 5 6 6 5 7 7 7 6 6
 7 4 6 5 5 5 5 5 6 5 6 6 5 6 5 5 5 5 4]
正确样本数:26,总测试样本数:56,正确率:0.464286

2000个样本时

准备数据
建立模型
训练模型
测试结果
[5 5 5 5 6 5 7 5 6 5 6 6 6 5 6 5 7 6 5 5 6 6 6 6 5 5 5 6 5 5 5 5 7 7 6 5 5
 6 5 5 5 6 5 5 5 6 6 6 6 6 5 5 5 5 5 5]
[6 6 5 6 6 5 7 5 8 5 6 5 5 6 8 5 7 7 5 5 6 6 5 6 5 6 6 6 5 6 6 5 7 7 7 6 6
 7 4 6 5 5 5 5 5 6 5 6 6 5 6 5 5 5 5 4]
正确样本数:31,总测试样本数:56,正确率:0.553571

如何安装Scikit-learn
Scikit-learn requires:

Python (>= 2.7 or >= 3.4),
NumPy (>= 1.8.2),
SciPy (>= 0.13.3).
Warning Scikit-learn 0.20 is the last version to support Python 2.7 and Python 3.4. Scikit-learn 0.21 will require Python 3.5 or newer.
If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip

Scikit-learn 0.21只能在python3.5及以上。
Scikit-learn 0.20是最后一个支持python2.7和python3.4的版本。

pip install -U scikit-learn
or conda:

conda install scikit-learn

sklearn之svm-葡萄酒质量预测(10)

猜你喜欢