python - PyCluster 的问题-6ren

python - PyCluster 的问题

转载作者：太空宇宙更新时间：2023-11-03 17:41:30

25

4

我有以下 python 代码:

  from Pycluster import *
  from numpy import *
  import matplotlib.pyplot as plt

   names = [ "A1", "A2", "A3", "A4", "A5", "A6", "A7", 
             "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"]

   distances = array([
   [0.000, 0.840, 0.860, 0.115, 0.150, 0.055, 0.000, 0.070, 0.065, 0.000, 0.165, 0.000, 0.000, 0.000, 0.065],
   [0.840, 0.000, 0.710, 0.060, 0.125, 0.060, 0.000, 0.070, 0.065, 0.000, 0.165, 0.000, 0.000, 0.000, 0.070],
   [0.860, 0.710, 0.000, 0.055, 0.120, 0.055, 0.000, 0.070, 0.065, 0.000, 0.000, 0.000, 0.000, 0.000, 0.065],
   [0.115, 0.060, 0.055, 0.000, 0.885, 0.455, 0.415, 0.060, 0.150, 0.050, 0.240, 0.000, 0.000, 0.065, 0.140],
   [0.150, 0.125, 0.120, 0.885, 0.000, 0.510, 0.330, 0.125, 0.165, 0.050, 0.145, 0.000, 0.000, 0.000, 0.200],
   [0.055, 0.060, 0.055, 0.455, 0.510, 0.000, 0.335, 0.060, 0.215, 0.050, 0.140, 0.000, 0.000, 0.000, 0.085],
   [0.000, 0.000, 0.000, 0.415, 0.330, 0.335, 0.000, 0.000, 0.245, 0.060, 0.255, 0.125, 0.000, 0.075, 0.225],
   [0.070, 0.070, 0.070, 0.060, 0.125, 0.060, 0.000, 0.000, 0.195, 0.000, 0.000, 0.000, 0.000, 0.000, 0.140],
   [0.065, 0.065, 0.065, 0.150, 0.165, 0.215, 0.245, 0.195, 0.000, 0.045, 0.135, 0.000, 0.000, 0.000, 0.155],
   [0.000, 0.000, 0.000, 0.050, 0.050, 0.050, 0.060, 0.000, 0.045, 0.000, 0.000, 0.120, 0.000, 0.045, 0.080],
   [0.165, 0.165, 0.000, 0.240, 0.145, 0.140, 0.255, 0.000, 0.135, 0.000, 0.000, 0.000, 0.000, 0.150, 0.150],
   [0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.125, 0.000, 0.000, 0.120, 0.000, 0.000, 0.175, 0.090, 0.105],
   [0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.175, 0.000, 0.000, 0.000],
   [0.000, 0.000, 0.000, 0.065, 0.000, 0.000, 0.075, 0.000, 0.000, 0.045, 0.150, 0.090, 0.000, 0.000, 0.000],
   [0.065, 0.070, 0.065, 0.140, 0.200, 0.085, 0.225, 0.140, 0.155, 0.080, 0.150, 0.105, 0.000, 0.000, 0.000]
   ])

   clusterids, error, nfound = kmedoids(distances, 6)
   print "Cluster ids:", clusterids
   print "error:", error
   print "nfound:", nfound

   cities_in_cluster = {}
   for name, clusterid in zip(names, clusterids):
        cities_in_cluster.setdefault(clusterid, []).append(name)

   import textwrap
   for centroid_id, city_names in cities_in_cluster.items():
        print "Cluster around", names[centroid_id]
        text = ", ".join(city_names)
        for line in textwrap.wrap(text, 70):
             print "  ", line

   colors = ['red', 'green', 'blue', 'yellow', 'white', 'black']

   medoids = {}  
   for i in clusterids:
        medoids[i]= medoids.get(i,0) + 1    

   plt.scatter(distances[:,0],distances[:,1], c=colors)
   plt.show()

这段代码存在两个问题:
- 每次执行都会给出不同的聚类结果。是吗？
- 该图只绘制了 11 个点，而不是 15 个点。

错误在哪里？

谢谢。

最佳答案

kmedoids 使用随机初始化，并且可能收敛到局部最小值。

所以，是的，如果多次运行它，您会得到不同的结果。

你的距离矩阵有可能不是距离吗？

您那里的 0 值太多。

行

[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.175, 0.000, 0.000, 0.000]

是一个极端的例子。通过查看你的矩阵，所有点本质上都是相同的，因为从任何一个点你都可以找到到任何其他点的 0 距离链!因此，你的矩阵不是距离矩阵。这种违反基本距离属性的行为可能会杀死 kmedoids 并导致其返回本质上随机的结果？

此外，不要绘制距离矩阵的散点图。散点图用于输入数据，而不是距离矩阵的前两行。如果您想从距离矩阵重建散点图，请使用多维缩放。

关于python - PyCluster 的问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30485216/

25

4

0

文章推荐： python - wxPython 新建、保存和另存为方法

文章推荐： c# - 新的 MVC 4 项目。默认路由被忽略

文章推荐： python - gspread 给出 TypeError : expected bytes when trying to log in

python - PyCluster 无法安装包
这是我在尝试安装 PyCluster 时遇到的错误。我在 spyder IDE 和 Windows 上使用 python 2.7 和 anaconda。 Downloading/unpacking P
python - PyCluster 的问题
我有以下 python 代码: from Pycluster import * from numpy import * import matplotlib.pyplot as plt
python - pyclustering 当矩阵具有三个以上的维度时可视化 xmeans
我正在尝试使用 pyclustering 库中的 xmeans 对一些数据进行聚类和可视化。我直接从 example 复制了代码在文档中， from pyclustering.cluster impo
r - 无法从带有网状结构的 pyclustering 对象中获取结果
我想使用 Python 库对 R 中的一些数据进行聚类 (pyclustering)。我正在使用 reticulate 包来执行此操作: library(reticulate) # create so
python - 使用 pycluster 进行加权聚类
我已经设法采用一个代码片段来说明如何使用 PyCluster 的 k-means 聚类算法。我希望能够对数据点进行加权，但不幸的是，我只能对特征进行加权。我是不是遗漏了什么，或者我是否可以使用一些技巧
python - kmedoids 使用具有各种距离函数的 Pycluster
我正在为 Windows 使用 python 2.6。我正在研究 OpenCv 核心模块。我搜索了 Pycluster 中定义的 kmedoids 函数，但没有得到准确的答案。我在windows7上
python - 我如何使用 pyclustering 来实现 kmedoids？
我不确定我是如何在 python 中使用 kmedoids 的。我已经从 https://pypi.org/project/pyclustering/ 安装了 pyclustering 模块但我不确定
python - 使用 PyCluster 优化 K(理想的簇数)
我正在使用 PyCluster 的 kMeans 对一些数据进行聚类——主要是因为 SciPy 的 kMeans2() 产生了无法克服的错误。 Mentioned here .不管怎样，PyClust
python - pyclustering : intended method of initializing kmeans
在维基百科上，有关于如何根据随机方法初始化 kmeans 簇位置的描述。在 pyclustering ，一个python集群库，各种集群都是用高性能c核实现的。这个核心比 numpy/sklearn
python - 绘制 kmeans 的输出(PyCluster impl)
在 python 中，kmeans 聚类的 plot 输出如何？我正在使用 PyCluster 包。allUserVector 是一个 n x m 维向量，基本上是具有 m 个特征的 n 个用户。 i

首页

博学

6Ren·AI

商城

python - PyCluster 的问题