gpt4 book ai didi

python - 能力k均值聚类?

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:08:25 25 4
gpt4 key购买 nike

我是算法和优化新手。
我正在尝试实现有能力的k-means,但是到目前为止还没有得到解决,而且效果很差。
这是CVRP仿真(电容车辆路径问题)的一部分。
我很好奇我是否把引用的算法解释错了。
参考:"Improved K-Means Algorithm for Capacitated Clustering Problem"(Geetha, Poonthalir, Vanathi)
模拟的CVRP有15个客户,1个仓库。
每个客户都有欧氏坐标(x,y)和需求。
共有3辆车,每辆车的容量为90辆。
因此,有能力的k-means试图将15个客户分为3辆车,每辆车的总需求不得超过车辆容量。
更新:
在引用的算法中,我无法捕捉到任何有关代码在“下一个最近的质心”用完时必须执行的操作的信息。
也就是说,当在下面的步骤14.b中检查了所有“最近的质心”时,customers[1]仍然未赋值。
这将导致索引1未分配的客户。
注:customer[1]是需求最大的客户(30)。
问:当满足这个条件时,代码应该怎么做?
这是我对参考算法的解释,请更正我的代码,谢谢。
给定n请求者(客户),n=customerCount和一个仓库
n个要求,
N坐标(X,Y)
计算集群数量,k=(所有需求之和)/vehicleCapacity
选择初始质心,
5.a.根据demand对客户进行排序,降序排列=d_customers
5.b.选择k中的第一个客户作为初始质心=d_customers
创建二进制矩阵centroids[0 .. k-1],维数=bin_matrix
6.a.用所有零填充(customerCount) x (k)
启动WHILE循环,条件=WHILEbin_matrix
7.a.not converged
循环开始,条件=forconverged = False
8.A.客户指数=I
计算从each customers到所有customers[i]centroids的欧氏距离
9.a.按升序排序edist
9.b.首先选择距离最近的edist
启动while循环,条件=centroid未分配给任何群集。
将所有其他未分配的客户分组=closest_centroid
11.a.将while customers[i]视为G的质心。
为每个closest_centroidG计算优先级Pi
12.a.优先级customers
12.b.选择优先级最高的客户。
12.c.优先级最高的客户具有索引=G
12.D.Q:如果找不到最高优先级的客户,我们该怎么办?
如果可能,将Pi = (distance from customers[i] to closest_cent) / demand[i]分配给Pi
13.a.hpc=customers[hpc]的需求,
13.b.质心构件的所有需求之和=centroids[closest_centroid]
13.c.customers[hpc]。。
13.d.将d1分配给dtot
13.e.updateIF (d1 + dtot) <= vehicleCapacity, THEN,row index=customers[hpc],column index=centroids[closest_centroid],设置为bin_matrix
如果hpc仍然closest_centroid到任何群集,则。。
14.a.选择1,距离customers[i]最近。
14.B.Q:如果没有下一个最近的质心,我们该怎么办?
通过比较前一个矩阵和更新后的矩阵bin_矩阵计算收敛。
15.a.如果not assigned没有变化,则设置next nearest centroid
否则,从更新的集群计算edist
16.a.根据每个集群的成员计算新的bin_matrix
16.b.converged = True=簇的所有new centroids之和,
16.c.centroids' coordinates=群集中所有sum_x的数目,
16.d.星团的新质心x-coordinate
16.e.用相同的公式,计算新的星团质心members
迭代main while循环。
在步骤14.b中,我的代码总是以未分配的客户结束。
也就是说,有一个num_c仍然没有分配给任何质心,并且它已经用完“下一个最近的质心”。
由此产生的集群很差。输出图:
-在图中,星星是质心,正方形是仓库。
在pic中,客户标记为“1”,demand=30总是以没有指定的集群结束。
程序的输出,

k_cluster 3
idx [ 1 -1 1 0 2 0 1 1 2 2 2 0 0 2 0]
centroids [(22.6, 29.2), (34.25, 60.25), (39.4, 33.4)]
members [[3, 14, 12, 5, 11], [0, 2, 6, 7], [9, 8, 4, 13, 10]]
demands [86, 65, 77]

第一和第三个簇的计算结果很差。
未分配索引为“ customers (members)”的 x-coordinatesum_x / num_c
问:我的解释和执行有什么问题?
任何更正、建议、帮助,将不胜感激,提前谢谢。
这是我的完整代码:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# pastebin.com/UwqUrHhh
# output graph: i.imgur.com/u3v2OFt.png

import math
import random
from operator import itemgetter
from copy import deepcopy
import numpy
import pylab

# depot and customers, [index, x, y, demand]
depot = [0, 30.0, 40.0, 0]
customers = [[1, 37.0, 52.0, 7], \
[2, 49.0, 49.0, 30], [3, 52.0, 64.0, 16], \
[4, 20.0, 26.0, 9], [5, 40.0, 30.0, 21], \
[6, 21.0, 47.0, 15], [7, 17.0, 63.0, 19], \
[8, 31.0, 62.0, 23], [9, 52.0, 33.0, 11], \
[10, 51.0, 21.0, 5], [11, 42.0, 41.0, 19], \
[12, 31.0, 32.0, 29], [13, 5.0, 25.0, 23], \
[14, 12.0, 42.0, 21], [15, 36.0, 16.0, 10]]
customerCount = 15
vehicleCount = 3
vehicleCapacity = 90
assigned = [-1] * customerCount

# number of clusters
k_cluster = 0
# binary matrix
bin_matrix = []
# coordinate of centroids
centroids = []
# total demand for each cluster, must be <= capacity
tot_demand = []
# members of each cluster
members = []
# coordinate of members of each cluster
xy_members = []

def distance(p1, p2):
return math.sqrt((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2)

# capacitated k-means clustering
# http://www.dcc.ufla.br/infocomp/artigos/v8.4/art07.pdf
def cap_k_means():
global k_cluster, bin_matrix, centroids, tot_demand
global members, xy_members, prev_members

# calculate number of clusters
tot_demand = sum([c[3] for c in customers])
k_cluster = int(math.ceil(float(tot_demand) / vehicleCapacity))
print 'k_cluster', k_cluster

# initial centroids = first sorted-customers based on demand
d_customers = sorted(customers, key=itemgetter(3), reverse=True)
centroids, tot_demand, members, xy_members = [], [], [], []
for i in range(k_cluster):
centroids.append(d_customers[i][1:3]) # [x,y]

# initial total demand and members for each cluster
tot_demand.append(0)
members.append([])
xy_members.append([])

# binary matrix, dimension = customerCount-1 x k_cluster
bin_matrix = [[0] * k_cluster for i in range(len(customers))]

converged = False
while not converged: # until no changes in formed-clusters
prev_matrix = deepcopy(bin_matrix)

for i in range(len(customers)):
edist = [] # list of distance to clusters

if assigned[i] == -1: # if not assigned yet
# Calculate the Euclidean distance to each of k-clusters
for k in range(k_cluster):
p1 = (customers[i][1], customers[i][2]) # x,y
p2 = (centroids[k][0], centroids[k][1])
edist.append((distance(p1, p2), k))

# sort, based on closest distance
edist = sorted(edist, key=itemgetter(0))

closest_centroid = 0 # first index of edist
# loop while customer[i] is not assigned
while assigned[i] == -1:
# calculate all unsigned customers (G)'s priority
max_prior = (0, -1) # value, index
for n in range(len(customers)):
pc = customers[n]

if assigned[n] == -1: # if unassigned
# get index of current centroid
c = edist[closest_centroid][1]
cen = centroids[c] # x,y

# distance_cost / demand
p = distance((pc[1], pc[2]), cen) / pc[3]

# find highest priority
if p > max_prior[0]:
max_prior = (p, n) # priority,customer-index

# if highest-priority is not found, what should we do ???
if max_prior[1] == -1:
break

# try to assign current cluster to highest-priority customer
hpc = max_prior[1] # index of highest-priority customer
c = edist[closest_centroid][1] # index of current cluster

# constraint, total demand in a cluster <= capacity
if tot_demand[c] + customers[hpc][3] <= vehicleCapacity:
# assign new member of cluster
members[c].append(hpc) # add index of customer

xy = (customers[hpc][1], customers[hpc][2]) # x,y
xy_members[c].append(xy)

tot_demand[c] += customers[hpc][3]
assigned[hpc] = c # update cluster to assigned-customer

# update binary matrix
bin_matrix[hpc][c] = 1

# if customer is not assigned then,
if assigned[i] == -1:
if closest_centroid < len(edist)-1:
# choose the next nearest centroid
closest_centroid += 1

# if run out of closest centroid, what must we do ???
else:
break # exit without centroid ???

# end while
# end for

# Calculate the new centroid from the formed clusters
for j in range(k_cluster):
xj = sum([cn[0] for cn in xy_members[j]])
yj = sum([cn[1] for cn in xy_members[j]])
xj = float(xj) / len(xy_members[j])
yj = float(yj) / len(xy_members[j])
centroids[j] = (xj, yj)

# calculate converged
converged = numpy.array_equal(numpy.array(prev_matrix), numpy.array(bin_matrix))
# end while

def clustering():
cap_k_means()

# debug plot
idx = numpy.array([c for c in assigned])
xy = numpy.array([(c[1], c[2]) for c in customers])

COLORS = ["Blue", "DarkSeaGreen", "DarkTurquoise",
"IndianRed", "MediumVioletRed", "Orange", "Purple"]

for i in range(min(idx), max(idx)+1):
clr = random.choice(COLORS)
pylab.plot(xy[idx==i, 0], xy[idx==i, 1], color=clr, \
linestyle='dashed', \
marker='o', markerfacecolor=clr, markersize=8)
pylab.plot(centroids[:][0], centroids[:][1], '*k', markersize=12)
pylab.plot(depot[1], depot[2], 'sk', markersize=12)

for i in range(len(idx)):
pylab.annotate(str(i), xy[i])

pylab.savefig('clust1.png')
pylab.show()

return idx

def main():
idx = clustering()
print 'idx', idx
print 'centroids', centroids
print 'members', members
print 'demands', tot_demand

if __name__ == '__main__':
main()

最佳答案

当总需求接近总容量时,这个问题开始呈现出bin packing的一些方面。正如您所发现的,这个特定算法的贪婪方法并不总是成功的我不知道作者是否承认这一点,但如果他们不承认,评论者应该抓住它。
如果您想继续使用这种算法,我将尝试使用integer programming将请求者分配给质心。

关于python - 能力k均值聚类?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17992189/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com