gpt4 book ai didi

python - 使用 Pandas groupby agg 函数将集合转换为列表会导致 'ValueError: Function does not reduce'

转载 作者:太空宇宙 更新时间:2023-11-03 16:45:46 26 4
gpt4 key购买 nike

有时,似乎我使用Python(和Pandas)越多,我理解的就越少。所以,如果我只是只见树木不见森林,我深表歉意,但我一直在原地打转,只是看不出自己做错了什么。

基本上,我有一个示例脚本(我想在更大的数据帧上实现),但我无法让它达到我满意的效果。

数据框由各种数据类型的列组成。我想将数据框分组为两列,然后生成一个新的数据框,其中包含每组中每个变量的所有唯一值的列表。 (最终,我想将列表项连接成一个字符串 - 但这是一个不同的问题。)

我使用的初始脚本是:

import numpy as np
import pandas as pd

def tempFuncAgg(tempVar):
tempList = set(tempVar.dropna()) # Drop NaNs and create set of unique values
print(tempList)
return tempList

# Define dataframe
tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"],
'gender': ["male","female","female","male","male","female","female",np.nan,"male","male","female","male","female","female","male","female","male","female",np.nan,"male"],
'age': ["young","old","old","old","old","old",np.nan,"old","old","young","young","old","young","young","old",np.nan,"old","young",np.nan,np.nan]})

# Groupby based on 2 categorical variables
tempGroupby = tempDF.groupby(['gender','age'])

# Aggregate for each variable in each group using function defined above
dfAgg = tempGroupby.agg(lambda x: tempFuncAgg(x))
print(dfAgg)

此脚本的输出符合预期:一系列包含值集的行和包含返回集的数据帧:

{'09/04/2015 23:03', '21/04/2015 12:59', '06/04/2015 12:34'}
{'01/06/2015 11:09', '12/05/2015 14:19', '27/05/2015 22:31', '19/06/2015 05:37'}
{'15/04/2015 07:12', '19/05/2015 19:22', '06/05/2015 11:12', '04/06/2015 12:57', '15/06/2015 03:23', '12/04/2015 01:00'}
{'02/04/2015 02:34', '10/05/2015 08:52'}
{2, 3, 6}
{18, 11, 13, 14}
{4, 5, 9, 12, 15, 17}
{1, 10}
date \
gender age
female old set([09/04/2015 23:03, 21/04/2015 12:59, 06/04...
young set([01/06/2015 11:09, 12/05/2015 14:19, 27/05...
male old set([15/04/2015 07:12, 19/05/2015 19:22, 06/05...
young set([02/04/2015 02:34, 10/05/2015 08:52])

id
gender age
female old set([2, 3, 6])
young set([18, 11, 13, 14])
male old set([4, 5, 9, 12, 15, 17])
young set([1, 10])

当我尝试将集合转换为列表时出现问题。奇怪的是,它生成 2 个包含相同列表的重复行,但随后失败并出现“ValueError:函数不会减少”错误。

def tempFuncAgg(tempVar):
tempList = list(set(tempVar.dropna())) # This is the only difference
print(tempList)
return tempList


tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"],
'gender': ["male","female","female","male","male","female","female",np.nan,"male","male","female","male","female","female","male","female","male","female",np.nan,"male"],
'age': ["young","old","old","old","old","old",np.nan,"old","old","young","young","old","young","young","old",np.nan,"old","young",np.nan,np.nan]})

tempGroupby = tempDF.groupby(['gender','age'])

dfAgg = tempGroupby.agg(lambda x: tempFuncAgg(x))
print(dfAgg)

但是现在的输出是:

['09/04/2015 23:03', '21/04/2015 12:59', '06/04/2015 12:34']
['09/04/2015 23:03', '21/04/2015 12:59', '06/04/2015 12:34']
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: Function does not reduce

任何解决此问题的帮助将不胜感激,如果我没有看到明显的情况,我提前表示歉意。

编辑顺便说一句,将集合转换为元组而不是列表是没有问题的。

最佳答案

pandas 中的列表有时会出现奇怪的问题。您可以:

  1. 使用元组(正如您已经注意到的)

  2. 如果您确实需要列表,只需在第二个操作中执行即可,如下所示:

    dfAgg.applymap(lambda x: list(x))

完整示例:

import numpy as np
import pandas as pd

def tempFuncAgg(tempVar):
tempList = set(tempVar.dropna()) # Drop NaNs and create set of unique values
print(tempList)
return tempList

# Define dataframe
tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"],
'gender': ["male","female","female","male","male","female","female",np.nan,"male","male","female","male","female","female","male","female","male","female",np.nan,"male"],
'age': ["young","old","old","old","old","old",np.nan,"old","old","young","young","old","young","young","old",np.nan,"old","young",np.nan,np.nan]})

# Groupby based on 2 categorical variables
tempGroupby = tempDF.groupby(['gender','age'])

# Aggregate for each variable in each group using function defined above
dfAgg = tempGroupby.agg(lambda x: tempFuncAgg(x))

# Transform in list
dfAgg.applymap(lambda x: list(x))

print(dfAgg)

pandas 中有很多这样的奇怪行为,通常最好继续使用解决方法(像这样),而不是寻找完美的解决方案

关于python - 使用 Pandas groupby agg 函数将集合转换为列表会导致 'ValueError: Function does not reduce',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36322217/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com