gpt4 book ai didi

python - 在 Python 中从头开始 K 均值聚类 - 重新定义质心

转载 作者:行者123 更新时间:2023-11-30 09:05:54 25 4
gpt4 key购买 nike

我正在尝试在 Python 中从头开始进行 K-Means 聚类。这是我的代码,我重新定义质心的方式有问题

这是我得到的输出:

Iteration 1:
[1.5, 8.1] [8.04, 1.525]
Iteration 2:
[4.98, 4.05] [2.87, 4.09]
Iteration 3:
[9.29, 8.28] [8.57, 7.87]
Iteration 4:
[9.97, 8.94] [inf, inf]

提前致谢!

# Example dataset
data = pd.DataFrame({'x' : [6.480, 7.320, 4.380, 8.040, 7.680, 6.600, 6.420, 5.940, 4.140, 5.700,
7.500, 7.620, 6.840, 7.500, 4.920, 3.780, 7.860, 4.260, 7.980, 6.840,
3.025, 2.300, 3.250, 2.975, 3.325, 1.500, 1.875, 2.850, 1.600, 2.525,
2.900, 2.175, 2.050, 1.650, 2.250, 3.475, 1.800, 2.975, 3.025, 2.175 ],

'y' : [6.300, 5.220, 6.060, 4.560, 7.080, 4.740, 3.660, 4.680, 4.800, 5.880,
8.100, 7.800, 3.900, 6.780, 4.860, 5.100, 4.380, 5.160, 5.520, 5.700,
2.125, 3.475, 2.500, 2.875, 2.075, 3.350, 1.525, 3.050, 2.950, 2.150,
2.125, 2.550, 3.375, 1.950, 1.700, 2.400, 2.525, 2.525, 2.675, 3.325]})

data['Cluster'] = 0
data['EuclideanDist1'] = 0
data['EuclideanDist2'] = 0
data['EuclideanDistD'] = 0

iterations = 0

C1nx = C1ny = 0
C2nx = C2ny = 0
C1c = 0
C2c = 0

C1 = [min(data['x']), max(data['y'])]
C2 = [max(data['x']), min(data['y'])]

count = 0

while(iterations < 40):
print(C1, C2)
for count in range(0, len(data)-1):

data['EuclideanDist1'][count] = ((data['x'][count] - C1[0])**2 + (data['y'][count] - C1[1])**2)**(0.5)
data['EuclideanDist2'][count] = ((data['x'][count] - C2[0])**2 + (data['y'][count] - C2[1])**2)**(0.5)
data['EuclideanDistD'][count] = data['EuclideanDist1'] [count]- data['EuclideanDist2'][count]


if data['EuclideanDistD'][count] >= 0:
data['Cluster'][count] = 1
C1nx = C1nx + data['x'][count]
C1ny = C1ny + data['y'][count]
C1c = C1c + 1

elif data['EuclideanDistD'][count] < 0:
data['Cluster'][count] = 2
C2nx = C2nx + data['x'][count]
C2ny = C2ny + data['y'][count]
C2c = C2c + 1

C1[0] = (C1nx / C1c)
C1[1] = (C1ny / C1c)
C2[0] = (C2nx / C2c)
C2[1] = (C2ny / C2c)

C1n = [0,0]
C2n = [0,0]
C1c = 0
C2c = 0

iterations = iterations + 1

最佳答案

  1. K-means使用欧几里得距离。删除开方。
  2. 如果起始条件不好,集群可能变空。然后你会得到除以 0 和 NaN 值。
  3. 你的逻辑是错误的。您将每个点分配给最远集群,这就是为什么您总是遇到上述问题。另外,增加 k > 2 并不容易。

避免在循环内堆叠查找[a][b]。它的可读性不太好,而且速度很慢。适当使用局部变量。由于 Python 在解释器模式下相当慢,因此请尽可能使用矢量化 numpy 运算,以从更快的 C/Fortran 代码中受益。

关于python - 在 Python 中从头开始 K 均值聚类 - 重新定义质心,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52066593/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com