实现《统计学习方法》P39 例3.1
输入:数据集,实例X,K值,以及计算距离的方法
输出:距离最近的K个数据,以及最近距离
例3.1:
首先定义三种计算距离的方法:欧氏距离,曼哈顿距离,以及各个坐标值的最大值
对传入的实例X,计算再不同的距离计算方法下的最近距离,及对应的最近的坐标值
Python代码:
import numpy as np
import math
x = np.array([[5,1],[4,4],[2,1],[3,3],[4,5],[7,2]])
xx = [1,1]
def Euclidean_distance(xi,yi):
"""
欧式距离计算
:param xi:
:param yi:
:return:
"""
sum_distance = 0
for i in range(len(xi)):
sum_distance += pow(abs(xi[i] - yi[i]) , 2)
return math.sqrt(sum_distance)
def Manhattan_distance(xi,yi):
"""
曼哈顿距离
:param xi:
:param yi:
:return:
"""
sum_distance = 0
for i in range(len(xi)):
sum_distance += abs(xi[i] - yi[i])
return sum_distance
def Max_distance(xi,yi):
"""
各个坐标距离的最大值
:param xi:
:param yi:
:return:
"""
sum_distance = 0
for i in range(len(xi)):
sum_distance = max(sum_distance , abs(xi[i] - yi[i]))
return sum_distance
def nearest_neighbr(xx , x , k , functin):
nearest_dict = {}
max_list = []
nearest_list = []
for i in range(len(x)):
distance = functin(xx,x[i])
max_list.append(distance)
if distance not in nearest_dict:
nearest_dict[distance] = [x[i].tolist()]
else:
nearest_dict[distance].append(x[i].tolist())
max_list.sort()
for i in max_list[:k]:
nearest_list += [z for z in nearest_dict[i]]
return max_list[:k] , nearest_list[:k]
E , E_list = nearest_neighbr(xx , x , 2 , Euclidean_distance)
M , M_list = nearest_neighbr(xx , x , 2 , Manhattan_distance)
Ma , Ma_list = nearest_neighbr(xx , x , 2 , Max_distance)
print(E)
print(E_list)
print(M)
print(M_list)
print(Ma)
print(Ma_list)
当后期需要实现KNN分类时
可将最近k个样本点的label值,取多数原则,对X实例进行类别划分即可