gpt4 book ai didi

python - Pandas :groupby 使用 numba 申请

转载 作者:行者123 更新时间:2023-12-04 12:33:58 24 4
gpt4 key购买 nike

使用 Pandas v1.1.0。
在 pandas 文档中有一个很好的例子,说明如何使用 numba 来加速 rolling.apply()操作here

import pandas as pd
import numpy as np

def mad(x):
return np.fabs(x - x.mean()).mean()

df = pd.DataFrame({"A": np.random.randn(100_000)},
index=pd.date_range('1/1/2000', periods=100_000, freq='T')
).cumsum()

df.rolling(10).apply(mad, engine="numba", raw=True)
我想使其适应 groupby 操作:
df['day'] = df.index.day
df.groupby('day').agg(mad)
工作正常。
df.groupby('day').agg(mad, engine='numba')
错误并给出
---------------------------------------------------------------------------
NumbaUtilError Traceback (most recent call last)
<ipython-input-21-ee23f1eec685> in <module>
----> 1 df.groupby('day').agg(mad, engine='numba')

~\AppData\Local\Continuum\anaconda3\envs\ds-cit-dev\lib\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
939
940 if maybe_use_numba(engine):
--> 941 return self._python_agg_general(
942 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs
943 )

~\AppData\Local\Continuum\anaconda3\envs\ds-cit-dev\lib\site-packages\pandas\core\groupby\groupby.py in _python_agg_general(self, func, engine, engine_kwargs, *args, **kwargs)
1068
1069 if maybe_use_numba(engine):
-> 1070 result, counts = self.grouper.agg_series(
1071 obj,
1072 func,

~\AppData\Local\Continuum\anaconda3\envs\ds-cit-dev\lib\site-packages\pandas\core\groupby\ops.py in agg_series(self, obj, func, engine, engine_kwargs, *args, **kwargs)
623
624 if maybe_use_numba(engine):
--> 625 return self._aggregate_series_pure_python(
626 obj, func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs
627 )

~\AppData\Local\Continuum\anaconda3\envs\ds-cit-dev\lib\site-packages\pandas\core\groupby\ops.py in _aggregate_series_pure_python(self, obj, func, engine, engine_kwargs, *args, **kwargs)
681
682 if maybe_use_numba(engine):
--> 683 numba_func, cache_key = generate_numba_func(
684 func, engine_kwargs, kwargs, "groupby_agg"
685 )

~\AppData\Local\Continuum\anaconda3\envs\ds-cit-dev\lib\site-packages\pandas\core\util\numba_.py in generate_numba_func(func, engine_kwargs, kwargs, cache_key_str)
215 nopython, nogil, parallel = get_jit_arguments(engine_kwargs)
216 check_kwargs_and_nopython(kwargs, nopython)
--> 217 validate_udf(func)
218 cache_key = (func, cache_key_str)
219 numba_func = NUMBA_FUNC_CACHE.get(

~\AppData\Local\Continuum\anaconda3\envs\ds-cit-dev\lib\site-packages\pandas\core\util\numba_.py in validate_udf(func)
177 or udf_signature[:min_number_args] != expected_args
178 ):
--> 179 raise NumbaUtilError(
180 f"The first {min_number_args} arguments to {func.__name__} must be "
181 f"{expected_args}"

NumbaUtilError: The first 2 arguments to mad must be ['values', 'index']
我猜 engine=numba它预计数据会略有不同。

最佳答案

自己也有这个问题。显然,要使用pandas + numba引擎,您需要以f(value, index)格式实现自定义函数。 .
根据文档(GroupBy.transform):

If the 'numba' engine is chosen, the function must be a user definedfunction with values and index as the first and second argumentsrespectively in the function signature. Each group’s index will bepassed to the user defined function and optionally available for use.


我有一个简单的函数 f(x)返回 int我想在 groupby 中使用。让它与 numba 一起工作所需要的只是将函数修改为 f(values, index)这样 numba 例程就会有一个有效的参数来将索引传递给函数。
以前的功能(工作正常,但不适用于 numba):
def equal_weight(arr) -> int:
'''
returns a float of 1/n where 'n' is the number of rows
'''
return 1 / len(arr)
新功能,兼容numba引擎:
def equal_weight(values, index) -> int:
'''
returns a float of 1/n where 'n' is the number of rows
'''
return 1 / len(values)

关于python - Pandas :groupby 使用 numba 申请,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63254419/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com