概述
1. KNN算法
K近邻(k-Nearest Neighbor,KNN)分类算法的核心思想是如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别。KNN算法可用于多分类,KNN算法不仅可以用于分类,还可以用于回归。通过找出一个样本的k个最近邻居,将这些邻居的属性的平均值赋给该样本,作为预测值。
KNeighborsClassifier在scikit-learn 在sklearn.neighbors包之中。KNeighborsClassifier使用很简单,三步:
1)创建KNeighborsClassifier对象,
2)调用fit函数,
3)调用predict函数进行预测。
以下代码说明了用法。
例子一:
- from sklearn.neighbors import KNeighborsClassifier
- X = [[0], [1], [2], [3],[4], [5],[6],[7],[8]]
- y = [0, 0, 0, 1, 1, 1, 2, 2, 2]
- neigh = KNeighborsClassifier(n_neighbors=3)
- neigh.fit(X, y)
- print(neigh.predict([[1.1]])) #结果[0]
- print(neigh.predict([[1.6]])) #结果[0]
- print(neigh.predict([[5.2]])) #结果[1]
- print(neigh.predict([[5.8]])) #结果[2]
- print(neigh.predict([[6.2]])) #结果[3]
例子二:
from sklearn import datasets
from sklearn
import *
# from sklearn.neighbors import KNeighborsClassifier
# from sklearn.cross_validation import train_test_split
iris=datasets.load_iris()
iris_X=iris.data
iris_Y=iris.target
X_train,X_test,Y_train,Y_test = train_test_split(iris_X,iris_Y,test_size=0.3)
knn=KNeighborsClassifier()
knn.fit(X_train,Y_train)
print(knn.predict(X_test))
print(Y_test)
2. 实例
1)小麦种子数据集 (seeds)
七个特征,面积、周长、紧密度、谷粒的长度、谷粒的宽度、偏度系数和谷粒槽长度。数据格式如下:
- 15.26 14.84 0.871 5.763 3.312 2.221 5.22 Kama
- 14.88 14.57 0.8811 5.554 3.333 1.018 4.956 Kama
- 14.29 14.09 0.905 5.291 3.337 2.699 4.825 Kama
- 13.84 13.94 0.8955 5.324 3.379 2.259 4.805 Kama
- 16.14 14.99 0.9034 5.658 3.562 1.355 5.175 Kama
- 14.38 14.21 0.8951 5.386 3.312 2.462 4.956 Kama
- 14.69 14.49 0.8799 5.563 3.259 3.586 5.219 Kama
- 14.11 14.1 0.8911 5.42 3.302 2.7 5.0 Kama
- 16.63 15.46 0.8747 6.053 3.465 2.04 5.877 Kama
2)代码
- # -*- coding:utf-8 -*-
- import numpy as np
- from matplotlib import pyplot as plt
- from matplotlib.colors import ListedColormap
- from sklearn.neighbors import KNeighborsClassifier
- from sklearn.cross_validation import KFold, cross_val_score
- feature_names = [
- 'area',
- 'perimeter',
- 'compactness',
- 'length of kernel',
- 'width of kernel',
- 'asymmetry coefficien',
- 'length of kernel groove',
- ]
- COLOUR_FIGURE = False
- def plot_decision(features, labels, num_neighbors=3):
- y_min, y_max = features[:, 2].min() * .9, features[:, 2].max() * 1.1
- x_min, x_max = features[:, 0].min() * .9, features[:, 0].max() * 1.1
- X, Y = np.meshgrid(np.linspace(x_min, x_max, 1000), np.linspace(y_min, y_max, 1000))
- model = KNeighborsClassifier(num_neighbors)
- model.fit(features[:, (0,2)], labels)
- C = model.predict(np.vstack([X.ravel(), Y.ravel()]).T).reshape(X.shape)
- if COLOUR_FIGURE:
- cmap = ListedColormap([(1., .7, .7), (.7, 1., .7), (.7, .7, 1.)])
- else:
- cmap = ListedColormap([(1., 1., 1.), (.2, .2, .2), (.6, .6, .6)])
- fig,ax = plt.subplots()
- ax.set_xlim(x_min, x_max)
- ax.set_ylim(y_min, y_max)
- ax.set_xlabel(feature_names[0])
- ax.set_ylabel(feature_names[2])
- ax.pcolormesh(X, Y, C, cmap=cmap)
- if COLOUR_FIGURE:
- cmap = ListedColormap([(1., .0, .0), (.1, .6, .1), (.0, .0, 1.)])
- ax.scatter(features[:, 0], features[:, 2], c=labels, cmap=cmap)
- else:
- for lab, ma in zip(range(3), "Do^"):
- ax.plot(features[labels == lab, 0],
- features[labels == lab, 2],
- ma,
- c=(1., 1., 1.),
- ms=6)
- return fig, ax
- def load_csv_data(filename):
- data = []
- labels = []
- datafile = open(filename)
- for line in datafile:
- fields = line.strip().split('t')
- data.append([float(field) for field in fields[:-1]])
- labels.append(fields[-1])
- data = np.array(data)
- labels = np.array(labels)
- return data, labels
- def accuracy(test_labels, pred_lables):
- correct = np.sum(test_labels == pred_lables)
- n = len(test_labels)
- return float(correct) / n
- if __name__ == '__main__':
- opt = input("raw_inputp[1 or 2]: ")
- features, labels = load_csv_data('data/seeds.tsv')
- if opt == '1':
- knn = KNeighborsClassifier(n_neighbors=5)
- kf = KFold(len(features), n_folds=3, shuffle=True)
- result_set = [(knn.fit(features[train], labels[train]).predict(features[test]), test) for train, test in kf]
- score = [accuracy(labels[result[1]], result[0]) for result in result_set]
- print(score)
- elif opt == '2':
- names = sorted(set(labels))
- labels = np.array([names.index(ell) for ell in labels])
- fig, ax = plot_decision(features, labels)
- plt.show()
- else:
- print('input 1 or 2 !')
代码简要说明
load_csv_data 从数据文件,读取数据。
accuracy 计算预测的准确度。
plot_decision 画决策边界图,挑两个特征。这个函数要注意pcolormesh。
主程序:输入1进行预测,输入2画图。第一个选项中,
a)首先生成分类器,
b)调用KFold来生产学习数据和测试数据,
3)训练和预测,
4)计算精度。
这里充分利用了“列表解析”和“向量”使代码简洁。
最后
以上就是彪壮悟空为你收集整理的sklearn包中K近邻分类器 KNeighborsClassifier的使用的全部内容,希望文章能够帮你解决sklearn包中K近邻分类器 KNeighborsClassifier的使用所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复