sklearn学习笔记（3）svm多分类

Support vector machines (SVMs) 是一系列的有监督的学习方法，主要用于分类、回归和异常点检测。

1. SVM的主要优点如下：

在高维空间有效；
当样本空间的维度比样本数高时任然有效；
使用训练样本的子集构建决策函数（这些样本点被称之为支持向量），因此它的内存效率很高；
SVM是一个全能型的机器学习算法：可以指定不同的核函数的决策函数，提供了常见的核函数，但是也可以指定自定义的核函数。

2. SVM的主要缺点有：

当特征维度远高于训练样本数时，该方法可能表现的不好；
SVM不直接提供概率估计，这些都是使用昂贵的5折交叉验证计算得到的。

Scikit-learn中的SVM支持稠密和稀疏两种向量作为输入。但是使用SVM为稀疏数据做预测，他一定是符合这样的数据（scipy.sparse）。为了最佳的性能，使用numpy.ndarray作为稠密向量以及scipy.sparse.csr_matrix 作为稀疏向量，并且dtype=float64。

3. 分类

SVC，NuSVC和LinearSVC是三种在数据集上进行多分类的分类器。SVC和NuSVC是相类似的方法，但是它们接受不同的参数并且它们的数学公式也不一样。
作为分类器，SVC, NuSVC and LinearSVC 都接收两个数组作为输入：训练样本 X [n_samples, n_features], 结果类标 y [n_samples] （可以是字符串或者是整数）。

3.1 二分类

下面的代码生成一个简单的SVM分类器模型：

>>> from sklearn import svm
>>> X = [[0, 0], [1, 1]]
>>> y = [0, 1]
>>> clf = svm.SVC()
>>> clf.fit(X, y)

接下来就利用训练好的模型进行预测：

>>> clf.predict([[2., 2.]])
array([1])

SVM决策函数依赖于训练数据的某个子集：称之为支持向量。这些支持向量的属性可以从一下三个成员中获取： support_vectors_, support_ 以及 n_support。

>>> # 获取支持向量
>>> clf.support_vectors_
array([[ 0.,  0.],
       [ 1.,  1.]])
>>> # 获取支持向量的索引
>>> clf.support_ 
array([0, 1]...)
>>> # 获取每个类的支持向量
>>> clf.n_support_ 
array([1, 1]...)

3.2 多类分类（Multi-class classification）

SVC 和 NuSVC 使用了“one-against-one”方法（Knerr et al.,1990）对于多类目标分类。如果n_class是类别的数目，那么n_class * （n_class - 1）/2个分类器会被构造并且每一个都要从两类数据中经过训练。为了提供与其他分类一致的接口，这个decision_function_shape选项允许集合“one-against-one”分类器的所有结果到一个大小为(n_samples,n_classes)的决策函数中：

from sklearn.svm import SVC
X = [[0,0], [1,1],[2,2],[3,3]]
Y = [0, 1,2,3]
clf = SVC( probability=True)
clf.fit(X,Y)
print(clf.predict([[0,0], [1,1],[2,2],[3,3]]))
print(clf.predict_proba([[0,0], [1,1],[2,2],[3,3]]))
print(clf.predict_log_proba([[0,0], [1,1],[2,2],[3,3]]))

打印如下：
[0 1 2 3]
[[ 0.15246393  0.23705461  0.30392427  0.30655719]
 [ 0.2550524   0.16488868  0.25497241  0.3250865 ]
 [ 0.32594085  0.25411181  0.16480942  0.25513792]
 [ 0.30659971  0.30340014  0.23672633  0.15327383]]
[[-1.88082723 -1.43946473 -1.19097672 -1.18235096]
 [-1.36628626 -1.80248468 -1.36659992 -1.12366397]
 [-1.12103935 -1.36998093 -1.80296549 -1.36595102]
 [-1.18221227 -1.19270275 -1.44085055 -1.87552923]]

注意：

clf = SVC( )默认情况probability=False.

采用函数clf.predict_proba([[0,0], [1,1],[2,2],[3,3]])会出现如下错误：

--------------------------------------------------
AttributeError   Traceback (most recent call last)
<ipython-input-45-4b8ec837cc18> in <module>()
      5 clf.fit(X,Y)
      6 print(clf.predict([[0,0], [1,1],[2,2],[3,3]]))
----> 7 print(clf.predict_proba([[0,0], [1,1],[2,2],[3,3]]))
      8 print(clf.predict_log_proba([[0,0], [1,1],[2,2],[3,3]]))
      9 #clf.predict([[2.,2.]])

D:\program\anaconda3\lib\site-packages\sklearn\svm\base.py in predict_proba(self)
    588         datasets.
    589         """
--> 590         self._check_proba()
    591         return self._predict_proba
    592 

D:\program\anaconda3\lib\site-packages\sklearn\svm\base.py in _check_proba(self)
    555     def _check_proba(self):
    556         if not self.probability:
--> 557             raise AttributeError("predict_proba is not available when "
    558                                  " probability=False")
    559         if self._impl not in ('c_svc', 'nu_svc'):

AttributeError: predict_proba is not available when  probability=False

SVM的决策函数依赖于训练数据集的支持向量子集。这些属性可以通过下面函数进行查看


#get support vector
print(clf.support_vectors_)
#get indices of support vectors
print(clf.support_)
#get number of support vectors for each class
clf.n_support_
#get support vector
print(clf.support_vectors_)
#get indices of support vectors
print(clf.support_)
#get number of support vectors for each class
clf.n_support_
//打印如下
[[ 0.  0.]
 [ 1.  1.]
 [ 2.  2.]
 [ 3.  3.]]

[0 1 2 3]

array([1, 1, 1, 1])

另一种函数操作：

one to one 方案：



clf = SVC(decision_function_shape='ovo')
clf.fit(X, Y)
#dec = clf.decision_function([[1]])
#print (dec.shape[1]) # 4 classes: 4*3/2 = 6
#print (clf.predict([[1]]))
 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovo', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

clf.decision_function_shape
'ovo'

dec = clf.decision_function([[0,0], [1,1],[2,2],[3,3]])
print(dec)
print (dec.shape) # 4 classes: 4*3/2 = 6
print (clf.predict([[0,0], [1,1],[2,2],[3,3]]))
dec

[[ 0.63212056  0.98168436  0.99987659  0.3495638   0.36775603  0.01819223]
 [-0.63212056  0.          0.3495638   0.63212056  0.98168436  0.3495638 ]
 [-0.3495638  -0.98168436 -0.3495638  -0.63212056  0.          0.63212056]
 [-0.01819223 -0.36775603 -0.99987659 -0.3495638  -0.98168436 -0.63212056]]
(4, 6)
[0 1 2 3]
array([[ 0.63212056,  0.98168436,  0.99987659,  0.3495638 ,  0.36775603,
         0.01819223],
       [-0.63212056,  0.        ,  0.3495638 ,  0.63212056,  0.98168436,
         0.3495638 ],
       [-0.3495638 , -0.98168436, -0.3495638 , -0.63212056,  0.        ,
         0.63212056],
       [-0.01819223, -0.36775603, -0.99987659, -0.3495638 , -0.98168436,
        -0.63212056]])

one to rest

clf.decision_function_shape = "ovr"
dec = clf.decision_function([[0,0], [1,1],[2,2],[3,3]])
print(dec)
print (dec.shape) # 
print (clf.predict([[0,0], [1,1],[2,2],[3,3]]))
dec

[[ 3.5         2.01629871  0.74881103 -0.26510974]
 [ 1.9459466   3.42964789  0.9459466  -0.32154108]
 [-0.32154108  1.9459466   3.42964789  0.9459466 ]
 [-0.26510974  0.74881103  2.01629871  3.5       ]]
(4, 4)
[0 1 2 3]
array([[ 3.5       ,  2.01629871,  0.74881103, -0.26510974],
       [ 1.9459466 ,  3.42964789,  0.9459466 , -0.32154108],
       [-0.32154108,  1.9459466 ,  3.42964789,  0.9459466 ],
       [-0.26510974,  0.74881103,  2.01629871,  3.5       ]])

同时，LinearSVC也实现了“one vs the rest”多分类策略。

from  sklearn.svm  import LinearSVC
lin_clf = LinearSVC()
print(lin_clf)
lin_clf.fit(X,Y)
dec = lin_clf.decision_function([[0,0], [1,1],[2,2],[3,3]])
print (dec.shape)
print (lin_clf.predict([[0,0], [1,1],[2,2],[3,3]]))

dec

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)
(4, 4)
[0 1 3 3]
array([[ 0.47058692, -0.15999794, -0.58668729, -1.02701668],
       [-0.70588262, -0.37335533, -0.48003017, -0.59461132],
       [-1.88235215, -0.58671272, -0.37337305, -0.16220597],
       [-3.05882169, -0.80007011, -0.26671593,  0.27019939]])

评分和概率
SVC方法decision_function给每个样本中的每个类一个评分，当我们将probability设置为True之后，我们可以通过predict_proba和predict_log_proba可以对类别概率进行评估。
Wu, Lin and Weng, “Probability estimates for multi-class classification by pairwise coupling”, JMLR 5:975-1005, 2004.
不均衡问题
我们可以通过class_weight和sample_weight两个关键字实现对特定类别或者特定样本的权重调整

------------

参考：https://blog.csdn.net/babybirdtofly/article/details/72886879

https://blog.csdn.net/u013709270/article/details/53365744