python - 如何对热图的 numpy 系数数组进行聚类-6ren

python - 如何对热图的 numpy 系数数组进行聚类

转载作者：太空宇宙更新时间：2023-11-03 18:03:12

我正在尝试对 2D numpy 数组进行分层聚类，以便当我在 d3.js 中将其绘制为相关矩阵时它看起来不错。

我的数据如下所示:

[[ 1.   0.091  0.147 ..., -0.239  0.113  -0.012 ]
 [ 0.091  1.  -0.153 ..., -0.004 -0.244  -0.00520801]
 [ 0.147 -0.153  1.  ..., -0.157  0.013   0.133]
 ..., 
 [-0.239  -0.004 -0.157   ...,  -0.265  -0.362  1. ]]

我将这些计算为 -1 和 1 之间的 PIL 逊相关系数。如您所见，从数组左上角到右下角的对角线存在 1 比 1 的相关性。

如果我绘制这些值的图表，我的相关矩阵如下所示:

correlation matrix before clustering

聚类后，我希望它与此有点相似，其中红色代表正相关，蓝色代表负相关:

heat

使用 matplotlib 和 scipy，我可以对系数进行聚类，使其看起来像热图，但是值会发生变化。我希望我的值(value)观保持不变。

I used this answer to graph the heatmap in python, but its not quite what I want since it changes my values. 。我需要的只是将数据聚类并输出到 csv/json 文件。

from scipy.spatial.distance import pdist, squareform
from scipy.cluster.hierarchy import linkage, dendrogram

data_dist = pdist(final_correlation, 'correlation') # If I use this, 
# it gives me an array that is half the size of my original correlation matrix. These are 
# the distances. How do I use this to re-order my correlation matrix as a clustered matrix?


Out[1]: # The size is 9730, as opposed to the original size of 19,600
[ 0.612  0.503  1.653 ...,  0.792  1.577
0.829]

更新如果有人了解 R，我尝试执行的代码可能类似于 this

最佳答案

抱歉没有给出完整的示例，但我找到了一种对数据进行聚类的方法，尽管没有我想要的那么好:

假设您有一个包含相关性和标题行的 csv 文件。您可以复制 csv 文件的内容并使用以下代码:

import scipy.cluster.hierarchy as hc
import pandas
from matplotlib import pyplot

# copy the data to the clipboard first
d = pandas.read_clipboard(sep=",", index_col=0)
d.columns = [int(x) for x in d.columns]

link = hc.linkage(d.values, method='centroid')
o1 = hc.leaves_list(link)

mat = d.iloc[o1,:]
mat = mat.iloc[:, o1[::-1]]
pyplot.imshow(mat)

这将导致如下结果: Imgur