gpt4 book ai didi

python - 如何检查两个数据集匹配列之间的相关性?

转载 作者:太空宇宙 更新时间:2023-11-03 13:12:05 25 4
gpt4 key购买 nike

如果我们有数据集:

import pandas as pd
a = pd.DataFrame({"A":[34,12,78,84,26], "B":[54,87,35,25,82], "C":[56,78,0,14,13], "D":[0,23,72,56,14], "E":[78,12,31,0,34]})
b = pd.DataFrame({"A":[45,24,65,65,65], "B":[45,87,65,52,12], "C":[98,52,32,32,12], "D":[0,23,1,365,53], "E":[24,12,65,3,65]})

如何创建一个相关矩阵,其中 y 轴代表“a”,x 轴代表“b”?

目的是查看两个数据集的匹配列之间的相关性,如下所示:

enter image description here

最佳答案

如果您不介意基于 NumPy 的矢量化解决方案,基于 this solution postComputing the correlation coefficient between two multi-dimensional arrays -

corr2_coeff(a.values.T,b.values.T).T # func from linked solution post.

sample 运行-

In [621]: a
Out[621]:
A B C D E
0 34 54 56 0 78
1 12 87 78 23 12
2 78 35 0 72 31
3 84 25 14 56 0
4 26 82 13 14 34

In [622]: b
Out[622]:
A B C D E
0 45 45 98 0 24
1 24 87 52 23 12
2 65 65 32 1 65
3 65 52 32 365 3
4 65 12 12 53 65

In [623]: corr2_coeff(a.values.T,b.values.T).T
Out[623]:
array([[ 0.71318502, -0.5923714 , -0.9704441 , 0.48775228, -0.07401011],
[ 0.0306753 , -0.0705457 , 0.48801177, 0.34685977, -0.33942737],
[-0.26626431, -0.01983468, 0.66110713, -0.50872017, 0.68350413],
[ 0.58095645, -0.55231196, -0.32053858, 0.38416478, -0.62403866],
[ 0.01652716, 0.14000468, -0.58238879, 0.12936016, 0.28602349]])

关于python - 如何检查两个数据集匹配列之间的相关性?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41004952/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com