gpt4 book ai didi

python - 聚合行 Pandas

转载 作者:行者123 更新时间:2023-11-28 21:36:18 24 4
gpt4 key购买 nike

我对 pandas 很陌生.我需要聚合 'Names'如果它们具有相同的名称,然后为 'Rating' 取平均值和 'NumsHelpful' (不计算 NaN )。 'Review'应该被连接,同时 'Weight(Pounds)'应该保持不变:

col names: ['Brand', 'Name', 'NumsHelpful', 'Rating', 'Weight(Pounds)', 'Review']

Name 'Brand' 'Name'
1534 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1535 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1536 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1537 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1538 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1539 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1540 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz

'NumsHelpful' 'Rating' 'Weight'
1534 NaN 2 4.5
1535 NaN 2 4.5
1536 NaN NaN 4.5
1537 NaN NaN 4.5
1538 2 NaN 4.5
1539 3 5 4.5
1540 5 NaN 4.5

'Review'
1534 Yummy - Delish
1535 The best Bloody Mary mix! - The best Bloody Ma...
1536 Best Taste by far - I've tried several if not ...
1537 Best bloody mary mix ever - This is also good ...
1538 Outstanding - Has a small kick to it but very ...
1539 OMG! So Good! - Spicy, terrific Bloody Mary mix!
1540 Good stuff - This is the best

所以输出应该是这样的:
 'Brand'                'Name'                   'NumsHelpful'    'Rating' 
Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz 3.33 3

'Weight' 'Review'
4.5 Review1 / Review2 / ... / ReviewN

我该如何进行?谢谢。

最佳答案

使用 DataFrameGroupBy.agg 带有列字典和聚合函数 - 列 WeightBrandfirst 聚合- 这意味着每组的第一个值:

d = {'NumsHelpful':'mean', 
'Review':'/'.join,
'Weight':'first',
'Brand':'first',
'Rating':'mean'}
df = df.groupby('Name').agg(d).reset_index()
print (df)
Name NumsHelpful \
0 Zing Zang Bloody Mary Mix, 32 fl oz 3.333333

Review Weight Brand \
0 Yummy - Delish/The best Bloody Mary mix! - The... 4.5 Zing Zang

Rating
0 3.0

同样在 Pandas 0.23.1 Pandas 版本中获得:

FutureWarning: 'Name' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version



解决方案是删除索引名称 Name :
df.index.name = None

或者:
df = df.rename_axis(None)

另一种可能的解决方案是不聚合 first ,但将这些列添加到 groupby :
d = {'NumsHelpful':'mean',  'Review':'/'.join, 'Rating':'mean'}
df = df.groupby(['Name', 'Weight','Brand']).agg(d).reset_index()

如果每组有相同的值,两种解决方案都会返回相同的输出。

编辑:

如果需要将字符串(对象)列转换为数字,请先尝试通过 astype 进行转换:
df['Weight(Pounds)'] = df['Weight(Pounds)'].astype(float)

如果失败,请使用 to_numeric 带参数 errors='coerce'用于将不可解析的字符串转换为 NaN s:
df['Weight(Pounds)'] = pd.to_numeric(df['Weight(Pounds)'], errors='coerce')

关于python - 聚合行 Pandas,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51230581/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com