gpt4 book ai didi

python - 传递什么 Pandas 数据类型以在 groupby 中进行转换或应用

转载 作者:太空狗 更新时间:2023-10-30 00:07:21 28 4
gpt4 key购买 nike

尝试调试 groupby 函数应用程序时,someone suggested我使用虚拟函数“查看正在传递的内容”到每个组的函数中。当然,我在玩游戏:

import numpy as np
import pandas as pd

np.random.seed(0) # so we can all play along at home

categories = list('abc')
categories = categories * 4
data_1 = np.random.randn(len(categories))
data_2 = np.random.randn(len(categories))

df = pd.DataFrame({'category': categories, 'data_1': data_1, 'data_2': data_2})

def f(x):
print type(x)
return x

print 'single column transform'
df.groupby(['category'])['data_1'].transform(f)
print '\n'

print 'single column (nested) transform'
df.groupby(['category'])[['data_1']].transform(f)
print '\n'

print 'multiple column transform'
df.groupby(['category'])[['data_1', 'data_2']].transform(f)

print '\n'
print '\n'

print 'single column apply'
df.groupby(['category'])['data_1'].apply(f)
print '\n'

print 'single column (nested) apply'
df.groupby(['category'])[['data_1']].apply(f)
print '\n'

print 'multiple column apply'
df.groupby(['category'])[['data_1', 'data_2']].apply(f)

这会将以下内容放入我的标准输出中:

single column transform
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>


single column (nested) transform
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


multiple column transform
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>




single column apply
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>


single column (nested) apply
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


multiple column apply
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>

看起来是这样的:

  • 转变
    • 单列:3 系列
    • 单列(嵌套):2 个 Series 和 3 个 DataFrame
    • 多列:3 个 Series 和 3 个 DataFrame
  • 申请
    • 单列:3 系列
    • 单列(嵌套):4 DataFrame
    • 多列:4 DataFrame

这是怎么回事?谁能解释为什么这 6 个调用中的每一个都会导致将上述一系列对象传递给指定的函数?

最佳答案

GroupBy.transform 将为您的函数尝试 fast_path 和 slow_path。

  • fast_path:使用 DataFrame 对象调用您的函数
  • slow_path:用DataFrame.apply函数调用你的函数

当fast_path的结果与slow_path相同时,会选择fast_path。

以下输出意味着它最终选择了 fast_path:

multiple column transform
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>

这是代码链接:

https://github.com/pydata/pandas/blob/master/pandas/core/groupby.py#L2277

编辑

检查调用堆栈:

import numpy as np
import pandas as pd

np.random.seed(0) # so we can all play along at home

categories = list('abc')
categories = categories * 4
data_1 = np.random.randn(len(categories))
data_2 = np.random.randn(len(categories))

df = pd.DataFrame({'category': categories, 'data_1': data_1, 'data_2': data_2})

import traceback
import inspect
import itertools

def f(x):
flag = True
stack = itertools.dropwhile(lambda x:"#stop here" not in x,
traceback.format_stack(inspect.currentframe().f_back))
print "*"*20
print x
print type(x)
print
print "\n".join(stack)
return x

df.groupby(['category'])[['data_1', 'data_2']].transform(f) #stop here

关于python - 传递什么 Pandas 数据类型以在 groupby 中进行转换或应用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20671817/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com