gpt4 book ai didi

python-3.x - 如何为高维数据集绘制热图?

转载 作者:行者123 更新时间:2023-12-01 04:39:39 28 4
gpt4 key购买 nike

如果您能让我知道如何为具有大约 150 个特征的大型数据集绘制高分辨率热图,我将不胜感激。

我的代码如下:

XX = pd.read_csv('Financial Distress.csv')

y = np.array(XX['Financial Distress'].values.tolist())
y = np.array([0 if i > -0.50 else 1 for i in y])
XX = XX.iloc[:, 3:87]
df=XX
df["target_var"]=y.tolist()
target_var=["target_var"]

fig, ax = plt.subplots(figsize=(8, 6))
correlation = df.select_dtypes(include=['float64',
'int64']).iloc[:, 1:].corr()
sns.heatmap(correlation, ax=ax, vmax=1, square=True)
plt.xticks(rotation=90)
plt.yticks(rotation=360)
plt.title('Correlation matrix')
plt.tight_layout()
plt.show()
k = df.shape[1] # number of variables for heatmap
fig, ax = plt.subplots(figsize=(9, 9))
corrmat = df.corr()
# Generate a mask for the upper triangle
mask = np.zeros_like(corrmat, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
cols = corrmat.nlargest(k, target_var)[target_var].index
cm = np.corrcoef(df[cols].values.T)
sns.set(font_scale=1.0)
hm = sns.heatmap(cm, mask=mask, cbar=True, annot=True,
square=True, fmt='.2f', annot_kws={'size': 7},
yticklabels=cols.values,
xticklabels=cols.
values)
plt.xticks(rotation=90)
plt.yticks(rotation=360)
plt.title('Annotated heatmap matrix')
plt.tight_layout()
plt.show()

它工作正常,但为具有超过 40 个特征的数据集绘制的热图太小。
enter image description here

提前致谢,

最佳答案

调整 figsize 和 dpi 对我有用。

我修改了您的代码并将热图的大小加倍到 165 x 165。渲染需要一段时间,但 png 看起来不错。我的后端是“module://ipykernel.pylab.backend_inline”。

正如我的原始答案中所述,我很确定您在创建新对象之前忘记关闭图形对象。试试 plt.close("all")之前 fig, ax = plt.subplots()如果你得到奇怪的效果。

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

print(plt.get_backend())

# close any existing plots
plt.close("all")

df = pd.read_csv("Financial Distress.csv")
# select out the desired columns
df = df.iloc[:, 3:].select_dtypes(include=['float64','int64'])

# copy columns to double size of dataframe
df2 = df.copy()
df2.columns = "c_" + df2.columns
df3 = pd.concat([df, df2], axis=1)

# get the correlation coefficient between the different columns
corr = df3.iloc[:, 1:].corr()
arr_corr = corr.as_matrix()
# mask out the top triangle
arr_corr[np.triu_indices_from(arr_corr)] = np.nan

fig, ax = plt.subplots(figsize=(24, 18))

hm = sns.heatmap(arr_corr, cbar=True, vmin=-0.5, vmax=0.5,
fmt='.2f', annot_kws={'size': 3}, annot=True,
square=True, cmap=plt.cm.Blues)

ticks = np.arange(corr.shape[0]) + 0.5
ax.set_xticks(ticks)
ax.set_xticklabels(corr.columns, rotation=90, fontsize=8)
ax.set_yticks(ticks)
ax.set_yticklabels(corr.index, rotation=360, fontsize=8)

ax.set_title('correlation matrix')
plt.tight_layout()
plt.savefig("corr_matrix_incl_anno_double.png", dpi=300)

全图:
corr_matrix_anno_double_image
左上角部分的缩放:
zoom_of_top_end_image

关于python-3.x - 如何为高维数据集绘制热图?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50997662/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com