gpt4 book ai didi

python - 如何使均值漂移聚类适用于五个以上的聚类?

转载 作者:太空宇宙 更新时间:2023-11-03 19:06:42 29 4
gpt4 key购买 nike

我在均值漂移聚类方面遇到了麻烦。当簇数较小(2、3、4)时,它的工作速度非常快并输出正确的结果,但当簇数增加时,它会失败。

例如,检测到 3 个集群正常: cluster success

但是当数量增加时就会失败: cluster centers fail clusters fail

这里是完整的代码 list :

#!/usr/bin/env python

import sys
import logging

import numpy as np

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plot

from sklearn.cluster import estimate_bandwidth, MeanShift, get_bin_seeds
from sklearn.datasets.samples_generator import make_blobs


def test_mean_shift():
logging.debug('Generating mixture')
count = 5000
blocks = 7
std_error = 0.5
mixture, clusters = make_blobs(n_samples=count, centers=blocks, cluster_std=std_error)

logging.debug('Measuring bendwith')
bandwidth = estimate_bandwidth(mixture)
logging.debug('Bandwidth: %r' % bandwidth)

mean_shift = MeanShift(bandwidth=bandwidth)

logging.debug('Clustering')
mean_shift.fit(mixture)

shifted = mean_shift.cluster_centers_
guess = mean_shift.labels_

logging.debug('Centers: %r' % shifted)

def draw_mixture(mixture, clusters, output='mixture.png'):
plot.clf()
plot.scatter(mixture[:, 0], mixture[:, 1],
c=clusters,
cmap=plot.cm.coolwarm)
plot.savefig(output)

def draw_mixture_shifted(mixture, shifted, output='mixture_shifted.png'):
plot.clf()
plot.scatter(mixture[:, 0], mixture[:, 1], c='r')
plot.scatter(shifted[:, 0], shifted[:, 1], c='b')
plot.savefig(output)

logging.debug('Drawing')
draw_mixture_shifted(mixture, shifted)
draw_mixture(mixture, guess)


if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)

test_mean_shift()

我做错了什么?

最佳答案

您可能必须选择较小的带宽。我不太熟悉启发式选择带宽的方式。所以这里的“问题”是启发式的,而不是实际的算法。

关于python - 如何使均值漂移聚类适用于五个以上的聚类?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14548370/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com