K折交叉验证:sklearn.model_selection.KFold(n_splits=3, shuffle=False, random_state=None)
思路:将训练/测试数据集划分n_splits个互斥子集,每次用其中一个子集当作验证集,剩下的n_splits-1个作为训练集,进行n_splits次训练和测试,得到n_splits个结果
注意点:对于不能均等份的数据集,其前n_samples % n_splits子集拥有n_samples // n_splits + 1个样本,其余子集都只有n_samples // n_splits样本
参数说明:
n_splits:表示划分几等份
shuffle:在每次划分时,是否进行洗牌
①若为Falses时,其效果等同于random_state等于整数,每次划分的结果相同
②若为True时,每次划分的结果都不一样,表示经过洗牌,随机取样的
random_state:随机种子数
属性:
①get_n_splits(X=None, y=None, groups=None):获取参数n_splits的值
②split(X, y=None, groups=None):将数据集划分成训练集和测试集,返回索引生成器
通过一个不能均等划分的栗子,设置不同参数值,观察其结果
①设置shuffle=False,运行两次,发现两次结果相同
- In [1]: from sklearn.model_selection import KFold
- ...: import numpy as np
- ...: X = np.arange(24).reshape(12,2)
- ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
- ...: kf = KFold(n_splits=5,shuffle=False)
- ...: for train_index , test_index in kf.split(X):
- ...: print('train_index:%s , test_index: %s ' %(train_index,test_index))
- ...:
- ...:
- train_index:[ 3 4 5 6 7 8 9 10 11] , test_index: [0 1 2]
- train_index:[ 0 1 2 6 7 8 9 10 11] , test_index: [3 4 5]
- train_index:[ 0 1 2 3 4 5 8 9 10 11] , test_index: [6 7]
- train_index:[ 0 1 2 3 4 5 6 7 10 11] , test_index: [8 9]
- train_index:[0 1 2 3 4 5 6 7 8 9] , test_index: [10 11]
- In [2]: from sklearn.model_selection import KFold
- ...: import numpy as np
- ...: X = np.arange(24).reshape(12,2)
- ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
- ...: kf = KFold(n_splits=5,shuffle=False)
- ...: for train_index , test_index in kf.split(X):
- ...: print('train_index:%s , test_index: %s ' %(train_index,test_index))
- ...:
- ...:
- train_index:[ 3 4 5 6 7 8 9 10 11] , test_index: [0 1 2]
- train_index:[ 0 1 2 6 7 8 9 10 11] , test_index: [3 4 5]
- train_index:[ 0 1 2 3 4 5 8 9 10 11] , test_index: [6 7]
- train_index:[ 0 1 2 3 4 5 6 7 10 11] , test_index: [8 9]
- train_index:[0 1 2 3 4 5 6 7 8 9] , test_index: [10 11]
- In [3]: from sklearn.model_selection import KFold
- ...: import numpy as np
- ...: X = np.arange(24).reshape(12,2)
- ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
- ...: kf = KFold(n_splits=5,shuffle=True)
- ...: for train_index , test_index in kf.split(X):
- ...: print('train_index:%s , test_index: %s ' %(train_index,test_index))
- ...:
- ...:
- train_index:[ 0 1 2 4 5 6 7 8 10] , test_index: [ 3 9 11]
- train_index:[ 0 1 2 3 4 5 9 10 11] , test_index: [6 7 8]
- train_index:[ 2 3 4 5 6 7 8 9 10 11] , test_index: [0 1]
- train_index:[ 0 1 3 4 5 6 7 8 9 11] , test_index: [ 2 10]
- train_index:[ 0 1 2 3 6 7 8 9 10 11] , test_index: [4 5]
- In [4]: from sklearn.model_selection import KFold
- ...: import numpy as np
- ...: X = np.arange(24).reshape(12,2)
- ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
- ...: kf = KFold(n_splits=5,shuffle=True)
- ...: for train_index , test_index in kf.split(X):
- ...: print('train_index:%s , test_index: %s ' %(train_index,test_index))
- ...:
- ...:
- train_index:[ 0 1 2 3 4 5 7 8 11] , test_index: [ 6 9 10]
- train_index:[ 2 3 4 5 6 8 9 10 11] , test_index: [0 1 7]
- train_index:[ 0 1 3 5 6 7 8 9 10 11] , test_index: [2 4]
- train_index:[ 0 1 2 3 4 6 7 9 10 11] , test_index: [5 8]
- train_index:[ 0 1 2 4 5 6 7 8 9 10] , test_index: [ 3 11]
- In [5]: from sklearn.model_selection import KFold
- ...: import numpy as np
- ...: X = np.arange(24).reshape(12,2)
- ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
- ...: kf = KFold(n_splits=5,shuffle=True,random_state=0)
- ...: for train_index , test_index in kf.split(X):
- ...: print('train_index:%s , test_index: %s ' %(train_index,test_index))
- ...:
- ...:
- train_index:[ 0 1 2 3 5 7 8 9 10] , test_index: [ 4 6 11]
- train_index:[ 0 1 3 4 5 6 7 9 11] , test_index: [ 2 8 10]
- train_index:[ 0 2 3 4 5 6 8 9 10 11] , test_index: [1 7]
- train_index:[ 0 1 2 4 5 6 7 8 10 11] , test_index: [3 9]
- train_index:[ 1 2 3 4 6 7 8 9 10 11] , test_index: [0 5]
- In [6]: from sklearn.model_selection import KFold
- ...: import numpy as np
- ...: X = np.arange(24).reshape(12,2)
- ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
- ...: kf = KFold(n_splits=5,shuffle=True,random_state=0)
- ...: for train_index , test_index in kf.split(X):
- ...: print('train_index:%s , test_index: %s ' %(train_index,test_index))
- ...:
- ...:
- train_index:[ 0 1 2 3 5 7 8 9 10] , test_index: [ 4 6 11]
- train_index:[ 0 1 3 4 5 6 7 9 11] , test_index: [ 2 8 10]
- train_index:[ 0 2 3 4 5 6 8 9 10 11] , test_index: [1 7]
- train_index:[ 0 1 2 4 5 6 7 8 10 11] , test_index: [3 9]
- train_index:[ 1 2 3 4 6 7 8 9 10 11] , test_index: [0 5]
4.用enumerate,可以在输出每份的同时,输出每份的索引
from sklearn.model_selection import StratifiedKFold
X = np.ones(10)
y = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
kfold = StratifiedKFold(n_splits=NFOLDS, shuffle=True, random_state=218)
kf = kfold.split(X, y)
for i, (train_fold, validate) in enumerate(kf):
print(i,train_fold,validate)
0 [0 1 3 4 5 7 9] [2 6 8]
1 [0 1 2 3 4 6 7 8 9] [5]
2 [0 2 3 5 6 7 8 9] [1 4]
3 [1 2 3 4 5 6 8 9] [0 7]
4 [0 1 2 4 5 6 7 8] [3 9]
5.n_splits属性值获取方式
- In [8]: kf.split(X)
- Out[8]: <generator object _BaseKFold.split at 0x00000000047FF990>
- In [9]: kf.get_n_splits()
- Out[9]: 5
- In [10]: kf.n_splits
- Out[10]: 5
最后
以上就是怡然荔枝为你收集整理的sklearn.model_selection.KFold的全部内容,希望文章能够帮你解决sklearn.model_selection.KFold所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复