gpt4 book ai didi

python - 如何找到 Pandas 数据框列中一组值之间的相关性

转载 作者:太空狗 更新时间:2023-10-29 22:01:33 25 4
gpt4 key购买 nike

我有一个数据框 df:

ID    Var1     Var2
1 1.2 4
1 2.1 6
1 3.0 7
2 1.3 8
2 2.1 9
2 3.2 13

我想为每个 ID 找到 Var1Var2 之间的 PIL 逊相关系数值

所以结果应该是这样的:

ID    Corr_Coef
1 0.98198
2 0.97073

更新:

必须确保变量的所有列都是intfloat

最佳答案

要获得所需的输出格式,您可以使用 .corrwith :

corrs = (df[['Var1', 'ID']]
.groupby('ID')
.corrwith(df.Var2)
.rename(columns={'Var1' : 'Corr_Coef'}))

print(corrs)
Corr_Coef
ID
1 0.98198
2 0.97073

通用解决方案:

import numpy as np

def groupby_coef(df, col1, col2, on_index=True, squeeze=True, name='coef',
keys=None, **kwargs):
"""Grouped correlation coefficient between two columns

Flat result structure in contrast to `groupby.corr()`.

Parameters
==========
df : DataFrame
col1 & col2: str
Columns for which to calculate correlation coefs
on_index : bool, default True
Specify whether you're grouping on index
squeeze : bool, default True
True -> Series; False -> DataFrame
name : str, default 'coef'
Name of DataFrame column if squeeze == True
keys : column label or list of column labels / arrays
Passed to `pd.DataFrame.set_index`
**kwargs :
Passed to `pd.DataFrame.groupby`
"""

# If we are grouping on something other than the index, then
# set as index first to avoid hierarchical result.
# Kludgy, but safer than trying to infer.
if not on_index:
df = df.set_index(keys=keys)
if not kwargs:
# Assume we're grouping on 0th level of index
kwargs = {'level': 0}
grouped = df[[col1]].groupby(**kwargs)
res = grouped.corrwith(df[col2])
res.columns = [name]
if squeeze:
res = np.squeeze(res)
return res

例子:

df_1 = pd.DataFrame(np.random.randn(10, 2), 
index=[1]*5 + [2]*5).add_prefix('var')
df_2 = df_1.reset_index().rename(columns={'index': 'var2'})

print(groupby_coef(df_1, 'var0', 'var1', level=0))
1 7.424e-18
2 -9.481e-19
Name: coef, dtype: float64

print(groupby_coef(df_2, col1='var0', col2='var1',
on_index=False, keys='var2'))
var2
1 7.424e-18
2 -9.481e-19
Name: coef, dtype: float64

关于python - 如何找到 Pandas 数据框列中一组值之间的相关性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45064916/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com