sklearn中的k折交叉验证

K折交叉验证： sklearn.model_selection.KFold(n_splits=3,shuffle=False,random_state=None)

思路：将训练/测试数据划分n_splits个互斥子集，每次用其中一个子集当作验证集，剩下的n_splits-1个作为训练集，进行n_splits次训练和测试，得到n_splits个结果

注意：对于不能均等分的数据集，前n_samples%n_splits子集拥有n_samples//n_splits+1个样本，其余子集只有n_samples//n_splits个样本

参数：

n_splits:表示划分几等份

shuffle:在每次划分时，是否进行洗牌

1)若为False，其效果等同于random_state等于整数，每次划分的结果相同

2）若为True时，每次划分的结果都不一样，表示经过洗牌，随机取样的

random_state：随机种子数

属性：

①get_n_splits(X=None, y=None, groups=None)：获取参数n_splits的值

②split(X, y=None, groups=None)：将数据集划分成训练集和测试集，返回索引生成器

A:设置shuffle=False，运行两次，发现两次结果相同

from sklearn.model_selection import KFold
import numpy as np

#设置shuffle=False，运行两次，发现两次结果相同
X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=False)
for train_index,test_index in kf.split(X):
    print('train_index %s, test_index %s'%(train_index,test_index))

X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=False)
for train_index,test_index in kf.split(X):
    print('train_index %s, test_index %s'%(train_index,test_index))

'''
train_index [ 3  4  5  6  7  8  9 10 11], test_index [0 1 2]
train_index [ 0  1  2  6  7  8  9 10 11], test_index [3 4 5]
train_index [ 0  1  2  3  4  5  8  9 10 11], test_index [6 7]
train_index [ 0  1  2  3  4  5  6  7 10 11], test_index [8 9]
train_index [0 1 2 3 4 5 6 7 8 9], test_index [10 11]
train_index [ 3  4  5  6  7  8  9 10 11], test_index [0 1 2]
train_index [ 0  1  2  6  7  8  9 10 11], test_index [3 4 5]
train_index [ 0  1  2  3  4  5  8  9 10 11], test_index [6 7]
train_index [ 0  1  2  3  4  5  6  7 10 11], test_index [8 9]
train_index [0 1 2 3 4 5 6 7 8 9], test_index [10 11]
'''

B. 设置shuffle=True时，运行两次，发现两次运行的结果不同

from sklearn.model_selection import KFold
import numpy as np

#设置shuffle=False，运行两次，发现两次结果相同
X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=True)
for train_index,test_index in kf.split(X):
    print('train_index %s, test_index %s'%(train_index,test_index))
print('-----------------------------------------------------------')
X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=True)
for train_index,test_index in kf.split(X):
    print('train_index %s, test_index %s'%(train_index,test_index))
'''
train_index [ 0  2  3  4  5  6  8 10 11], test_index [1 7 9]
train_index [ 1  2  3  4  5  7  9 10 11], test_index [0 6 8]
train_index [ 0  1  2  3  5  6  7  8  9 10], test_index [ 4 11]
train_index [ 0  1  2  3  4  6  7  8  9 11], test_index [ 5 10]
train_index [ 0  1  4  5  6  7  8  9 10 11], test_index [2 3]
-----------------------------------------------------------
train_index [ 0  1  2  3  4  6  7  8 11], test_index [ 5  9 10]
train_index [ 0  3  4  5  6  8  9 10 11], test_index [1 2 7]
train_index [ 0  1  2  4  5  6  7  8  9 10], test_index [ 3 11]
train_index [ 0  1  2  3  5  7  8  9 10 11], test_index [4 6]
train_index [ 1  2  3  4  5  6  7  9 10 11], test_index [0 8]
'''

C: 设置shuffle=True和random_state=整数，发现每次运行的结果都相同

from sklearn.model_selection import KFold
import numpy as np

#设置shuffle=False，运行两次，发现两次结果相同
X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=True,random_state=10)
for train_index,test_index in kf.split(X):
    print('train_index %s, test_index %s'%(train_index,test_index))
print('-----------------------------------------------------------')
X=np.arange(24).reshape(12,2)
y=np.random.choice([1,2],12,p=[0.4,0.6])#1,2 总共出现12次，其中1出现的概率为0.4,2出现的概率为0.6
kf=KFold(n_splits=5,shuffle=True,random_state=10)
for train_index,test_index in kf.split(X):
    print('train_index %s, test_index %s'%(train_index,test_index))
'''
0  1  3  4  5  8  9 10 11], test_index [2 6 7]
train_index [ 0  1  2  3  4  6  7  9 10], test_index [ 5  8 11]
train_index [ 0  1  2  4  5  6  7  8  9 11], test_index [ 3 10]
train_index [ 2  3  4  5  6  7  8  9 10 11], test_index [0 1]
train_index [ 0  1  2  3  5  6  7  8 10 11], test_index [4 9]
-----------------------------------------------------------
train_index [ 0  1  3  4  5  8  9 10 11], test_index [2 6 7]
train_index [ 0  1  2  3  4  6  7  9 10], test_index [ 5  8 11]
train_index [ 0  1  2  4  5  6  7  8  9 11], test_index [ 3 10]
train_index [ 2  3  4  5  6  7  8  9 10 11], test_index [0 1]
train_index [ 0  1  2  3  5  6  7  8 10 11], test_index [4 9]
'''

sklearn中的k折交叉验证

K折交叉验证： sklearn.model_selection.KFold(n_splits=3,shuffle=False,random_state=None)

猜你喜欢