gpt4 book ai didi

python - sklearn KMedoids 返回空集群

转载 作者:行者123 更新时间:2023-12-04 09:43:42 29 4
gpt4 key购买 nike

我正在使用来自 sklearn_extra.cluster 的 KMedoids。我将它与预先计算的距离矩阵(metric='precomputed')一起使用,它曾经工作过。但是,我们发现距离矩阵的计算方式存在错误,因此必须自己实现。从那时起,KMedoids 算法不再起作用。这是堆栈跟踪:

C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 1 is empty! self.labels_[self.medoid_indices_[1]] may not be labeled with its corresponding cluster (1).
warnings.warn(enter code here
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 2 is empty! self.labels_[self.medoid_indices_[2]] may not be labeled with its corresponding cluster (2).
warnings.warn(
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 3 is empty! self.labels_[self.medoid_indices_[3]] may not be labeled with its corresponding cluster (3).
warnings.warn(
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 4 is empty! self.labels_[self.medoid_indices_[4]] may not be labeled with its corresponding cluster (4).
warnings.warn(
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 5 is empty! self.labels_[self.medoid_indices_[5]] may not be labeled with its corresponding cluster (5).
warnings.warn(
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 6 is empty! self.labels_[self.medoid_indices_[6]] may not be labeled with its corresponding cluster (6).
warnings.warn(
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 7 is empty! self.labels_[self.medoid_indices_[7]] may not be labeled with its corresponding cluster (7).
warnings.warn(

我检查了距离矩阵,它是一个二维 nparray,维度为 n_data x n_data,其中对角线上的值为零,所以这应该不是问题。所有值都在 0 到 1 之间。我们曾经使用 this algorithm for the Gower distance ,但是当我们出于某种原因只有分类数据时,这不起作用。我们所有的值都是 bool 值。 Gower 距离返回如下:
File "C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\gower\gower_dist.py", line 62, in gower_matrix
Z_num = np.divide(Z_num ,num_max,out=np.zeros_like(Z_num), where=num_max!=0)
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode '?') according to the casting rule ''same_kind''

我还尝试了 pyclustering KMedoids 并且确实有效。但是,您需要使用 pyclustering 自己定义初始 medoids,而我找到的方法不适用于分类数据。 (见下文)
initial_medoids = kmeans_plusplus_initializer(data, n_clus, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize(return_index=True)

堆栈跟踪:
File "path_to_file", line 19, in <module>
initial_medoids = kmeans_plusplus_initializer(data, n_clus, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize(return_index=True)
File "path\Python\Python38-32\lib\site-packages\pyclustering\cluster\center_initializer.py", line 357, in initialize
index_point = self.__get_next_center(centers)
File "path\Python\Python38-32\lib\site-packages\pyclustering\cluster\center_initializer.py", line 256, in __get_next_center
distances = self.__calculate_shortest_distances(self.__data, centers)
File "path\Python\Python38-32\lib\site-packages\pyclustering\cluster\center_initializer.py", line 236, in __calculate_shortest_distances
dataset_differences[index_center] = numpy.sum(numpy.square(data - center), axis=1).T
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

我的问题可以通过三种方式解决,所以我希望有人可以帮助我:
  • 有人知道为什么 sk-learn 的 KMedoids 不起作用并且可以帮助我解决这个问题,所以我可以使用它。
  • 有人知道我用 PyPI 的 Gower 函数做错了什么,所以我可以使用 pyclustering 或 sklearn。
  • 有人知道我如何轻松找到用于 pyclustering 的初始 medoids,因此我可以使用 pyclustering。

  • 我已经发布了下面代码的简单版本。
    import pandas as pd
    import gower_distance as dist
    from sklearn_extra.cluster import KMedoids

    data = pd.read_csv(path_to_data)
    dist = calcDist(data) # Returns NxN array where N is the amount of data points
    # I'm using 8 clusters, which is the default, so I haven't defined it
    kmedoids = KMedoids(metric='precomputed').fit(dist)
    labels = kmedoids.predict(dist)

    最佳答案

    我也收到了那个警告(但是使用欧几里得距离)。使用集群核心的另一个初始化为我修复了它:

    kmedoids = KMedoids(metric='precomputed', init='k-medoids++').fit(dist)

    关于python - sklearn KMedoids 返回空集群,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62215324/

    29 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com