gpt4 book ai didi

Python - Groupby DataFrameGroupBy 对象

转载 作者:行者123 更新时间:2023-12-01 09:05:22 26 4
gpt4 key购买 nike

我有一个 Python 中的 panda 数据框,我正在其中应用 groupby。然后我想对之前的结果应用新的 groupby + sum 。更具体地说,首先我正在做:

  check_df = data_df.groupby(['hotel_code', 'dp_id', 'market', 'number_of_rooms'])
[['market', 'number_of_rooms']]

然后我想做:

check_df = check_df.groupby(['market'])['number_of_rooms'].sum()

因此,我收到以下错误:

    AttributeError: Cannot access callable attribute 'groupby' of 'DataFrameGroupBy'
objects, try using the 'apply' method

我的初始数据如下所示:

hotel_code | market | number_of_rooms | ....
---------------------------------------------
001 | a | 200 | ...
001 | a | 200 |
002 | a | 300 | ...

请注意,我可能有像 (a - 200) 这样的对的重复项,这就是为什么我需要第一个 groupby。我最终想要的是这样的:

Market | Rooms
--------------
a | 3000
b | 250

我只是想将以下 sql 查询转换为 python:

select a.market, sum(a.number_of_rooms)
from (
select market, number_of_rooms
from opinmind_dev..cg_mm_booking_dataset_full
group by hotel_code, market, number_of_rooms
) as a
group by market ;

有什么想法可以解决这个问题吗?如果您需要更多信息,请告诉我。

ps。我是 Python 和数据科学新手

最佳答案

IIUC,而不是:

check_df = data_df.groupby(['hotel_code', 'dp_id', 'market', 'number_of_rooms'])
[['market', 'number_of_rooms']]

你应该简单地这样做:

check_df = data_df.drop_duplicates(subset=['hotel_code', 'dp_id', 'market', 'number_of_rooms'])\
.loc[:, ['market', 'number_of_rooms']]\
.groupby('market')\
.sum()

关于Python - Groupby DataFrameGroupBy 对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52105980/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com