gpt4 book ai didi

python - k-均值算法不起作用

转载 作者:塔克拉玛干 更新时间:2023-11-03 04:58:39 25 4
gpt4 key购买 nike

我正在尝试使用 Numpy 在 Python 3 中实现 k-means 算法。我的输入数据矩阵是一个简单的 n x 2 点数据矩阵:

[[1, 2],
[3, 4],
...
[7, 13]]

出于某种原因,在迭代的每个步骤中,我的标签都不相同。每一个标签都是不同的。有没有人看到我正在做的任何明显错误?我尝试在我的代码中添加一些注释,以便人们可以理解我正在执行的各个步骤。

def kmeans(X,k):

# Initialize by choosing k random data points as centroids
num_features = X.shape[1]
centroids = X[np.random.randint(X.shape[0], size=k), :] # find k centroids
iterations = 0
old_labels, labels = [], []

while not should_stop(old_labels, labels, iterations):
iterations += 1

clusters = [[] for i in range(0,k)]
for i in range(k):
clusters[i].append(centroids[i])

# Label points
old_labels = labels
labels = []
for point in X:
distances = [np.linalg.norm(point-centroid) for centroid in centroids]
max_centroid = np.argmax(distances)
labels.append(max_centroid)
clusters[max_centroid].append(point)

# Compute new centroids
centroids = np.empty(shape=(0,num_features))
for cluster in clusters:
avgs = sum(cluster)/len(cluster)
centroids = np.append(centroids, [avgs], axis=0)

return labels

def should_stop(old_labels, labels, iterations):
count = 0
if len(old_labels) == 0:
return False
for i in range(len(labels)):
count += (old_labels[i] != labels[i])
print(count)
if old_labels == labels or iterations == 2000:
return True
return False

最佳答案

max_centroid = np.argmax(distances)

您想找到最小化距离的质心,而不是最大化它的质心。

关于python - k-均值算法不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40813272/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com