聚类路线图(算法选择)

一般实验选择sklearn包。需要从两个方面看。数据量和样本分布。建议看下面两个图和链接进行粗选。

    一般实验选择sklearn包。需要从两个方面看。数据量和样本分布。建议看下面两个图和链接进行粗选。

    链接:http://sklearn.apachecn.org/cn/0.19.0/modules/clustering.html#different-linkage-type-ward-complete-and-average-linkage

Method name(方法名称)

Parameters(参数)

Scalability(可扩展性)

Usecase(使用场景)

Geometry (metric used)(几何图形(公制使用))

扫描二维码关注公众号,回复: 4164304 查看本文章

K-Means(K-均值)

number of clusters(聚类形成的簇的个数)

非常大的 n_samples, 中等的 n_clusters 使用 MiniBatch code(MiniBatch 代码)

通用, 均匀的 cluster size(簇大小), flat geometry(平面几何), 不是太多的 clusters(簇)

Distances between points(点之间的距离)

Affinity propagation

damping(阻尼), sample preference(样本偏好)

Not scalable with n_samples(n_samples 不可扩展)

Many clusters, uneven cluster size, non-flat geometry(许多簇,不均匀的簇大小,非平面几何)

Graph distance (e.g. nearest-neighbor graph)(图形距离(例如,最近邻图))

Mean-shift

bandwidth(带宽)

Not scalable with n_samples (不可扩展的 n_samples)

Many clusters, uneven cluster size, non-flat geometry(许多簇,不均匀的簇大小,非平面几何)

Distances between points(点之间的距离)

Spectral clustering

number of clusters(簇的个数)

中等的 n_samples, 小的 n_clusters

Few clusters, even cluster size, non-flat geometry(几个簇,均匀的簇大小,非平面几何)

Graph distance (e.g. nearest-neighbor graph)(图形距离(例如最近邻图))

Ward hierarchical clustering

number of clusters(簇的个数)

大的 n_samples 和 n_clusters

Many clusters, possibly connectivity constraints(很多的簇,可能连接限制)

Distances between points(点之间的距离)

Agglomerative clustering

number of clusters(簇的个数), linkage type(链接类型), distance(距离)

大的 n_samples 和 n_clusters

Many clusters, possibly connectivity constraints, non Euclidean distances(很多簇,可能连接限制,非欧几里得距离)

Any pairwise distance(任意成对距离)

DBSCAN

neighborhood size(neighborhood 的大小)

非常大的 n_samples, 中等的 n_clusters

Non-flat geometry, uneven cluster sizes(非平面几何,不均匀的簇大小)

Distances between nearest points(最近点之间的距离)

Gaussian mixtures(高斯混合)

many(很多)

Not scalable(不可扩展)

Flat geometry, good for density estimation(平面几何,适用于密度估计)

Mahalanobis distances to centers(Mahalanobis 与中心的距离)

Birch

branching factor(分支因子), threshold(阈值), optional global clusterer(可选全局簇).

大的 n_clusters 和 n_samples

Large dataset, outlier removal, data reduction.(大数据集,异常值去除,数据简化)

Euclidean distance between points(点之间的欧式距离)

猜你喜欢

转载自blog.csdn.net/u012863603/article/details/84302316