0 写在前面
聚类算法相关概念可以看我写的这篇文章:地址
这篇文章将解析利用sklearn作聚类分析时常用到的代码,以下的api都是属于sklearn.cluster以下的,使用时注意。
参考资料:
1 K-Means算法
.KMeans(n_clusters=8, init=’k-means++’, n_init=10, max_iter=300, tol=0.0001, precompute_distances=’auto’, verbose=0, random_state=None, copy_x=True, n_jobs=None, algorithm=’auto’)
n_clusters
:这个参数指出聚类簇的个数即k的值;
init
:默认k-means++
,可以选择较分散的点作为初始聚类中心点,还可以选择random
,表示随机选择初始点作为聚类中心迭代。
.MiniBatchKMeans(n_clusters=8, init=’k-means++’, max_iter=100, batch_size=100, verbose=0, compute_labels=True, random_state=None, tol=0.0, max_no_improvement=10, init_size=None, n_init=3, reassignment_ratio=0.01)
mini-batch思想:每次迭代从样本中随机选择一部分的样本来训练。其效果只比不mini-batch的差一点,但训练速度将加快很多。
2 亲和力传播
.AffinityPropagation(damping=0.5, max_iter=200, convergence_iter=15, copy=True, preference=None, affinity=’euclidean’, verbose=False)
3 Mean Shift
.MeanShift(bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1, cluster_all=True, n_jobs=None)
4 光谱聚类
.SpectralClustering(n_clusters=8, eigen_solver=None, random_state=None, n_init=10, gamma=1.0, affinity=’rbf’, n_neighbors=10, eigen_tol=0.0, assign_labels=’kmeans’, degree=3, coef0=1, kernel_params=None, n_jobs=None)
5 层次聚类
.AgglomerativeClustering(n_clusters=2, affinity=’euclidean’, memory=None, connectivity=None, compute_full_tree=’auto’, linkage=’ward’, pooling_func=’deprecated’, distance_threshold=None)
6 DBSCAN
.DBSCAN(eps=0.5, min_samples=5, metric=’euclidean’, metric_params=None, algorithm=’auto’, leaf_size=30, p=None, n_jobs=None)
7 OPTICS
.OPTICS(min_samples=5, max_eps=inf, metric=’minkowski’, p=2, metric_params=None, cluster_method=’xi’, eps=None, xi=0.05, predecessor_correction=True, min_cluster_size=None, algorithm=’auto’, leaf_size=30, n_jobs=None)
8 Birch
.Birch(threshold=0.5, branching_factor=50, n_clusters=3, compute_labels=True, copy=True)