仿scikit-learn模式写的kNN算法

1、什么是kNN算法

k邻近是指每个样本都可以用它最接近的k个邻居来代表。

核心思想：如果一个样本在特征空间中的k个最相邻的样本中大多数属于一个某类别，则该样本也属于这个类别。

2、将kNN封装成kNNClassifier

　　训练样本的特征在二维空间中的表示

、

　　kNN的训练过程如下图

　　完整代码

import numpy as np
from math import sqrt
from collections import Counter

class kNNClassifier():
    def __init__(self, k):
        """初始化kNN分类器"""
        assert k >= 1, "k must be valid"
        self.k = k
        self._x_train = None
        self._y_train = None

    def fit(self, x_train, y_train):
        """根据训练集x_train和y_train训练kNN分类器"""
        assert x_train.shape[0] == y_train.shape[0], \
            "the size of x_train must be equal to the size of y_train"
        assert x_train.shape[0] >= self.k, "the size of x_train must be at least k"
        self._x_train = x_train
        self._y_train = y_train
        return self

    def predict(self, X_predict):
        """给定待预测数据集X_train,返回表示x_train的结果向量"""
        assert self._x_train is not None and self._y_train is not None, \
            "must fit before predict"
        assert X_predict.shape[1] == self._x_train.shape[1] , \
            "the feature number of X_predict must be equal to x_train"
        y_predict = [self._predict(x) for x in X_predict]
        return np.array(y_predict)

    def _predict(self, x):
        """给定待预测数据x,返回x预测的结果值"""
        assert x.shape[0] == self._x_train.shape[1], \
            "the feature number of x must be equal tu x_train"
        distances = [sqrt(np.sum((x_train-x)**2)) for x_train in self._x_train]
        nearest = np.argsort(distances)
        topK_y = [self._y_train[i] for i in nearest[:k]]
        votes = Counter(topK_y)
        return votes.most_common(1)[0][0]

    def __repr__(self):
        return "kNN(k=%d)" % self.k

if __name__ == "__main__":
    x_train = np.array([[0.31864691, 0.99608349],
                        [0.8609734 , 0.40706129],
                        [0.86746155, 0.20136923],
                        [0.4346735 , 0.17677379],
                        [0.42842348, 0.68055183],
                        [0.70661963, 0.76155652],
                        [0.73379517, 0.6123456 ],
                        [0.68330672, 0.52193524],
                        [0.11192091, 0.07885633],
                        [0.99273292, 0.62484263]])
    y_train = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
    k = 6
    x = np.array([0.756789,0.6123456])
    knn = kNNClassifier(k)
    knn.fit(x_train,y_train)
    x_predict = x.reshape(1,-1)
    print(knn.predict(x_predict))

2、测试结果

[1]

仿scikit-learn模式写的kNN算法

猜你喜欢