gpt4 book ai didi

python - 从数据框中删除总和为零的行

转载 作者:行者123 更新时间:2023-12-05 04:14:20 25 4
gpt4 key购买 nike

我已经删除了公司在给定的 period 内没有收取任何费用的行(例如,revenue == 0 的行)。

这是一个账单数据的例子:

import numpy as np
import pandas as pd

data = {
'account_id': ['111','111','222','333','666','666','111','222','333','666','666'],
'company': ['initech','initech','jackson steinem & co','ingen','enron','enron','initech','jackson steinem & co','ingen','enron','enron'],
'billing_type': ['subscription','discount','subscription','subscription','subscription','discount','subscription','subscription','subscription','subscription','discount'],
'period': ['2012-10-31','2012-10-31','2012-10-31','2012-10-31','2012-10-31','2012-10-31','2012-11-30','2012-11-30','2012-11-30','2012-11-30','2012-11-30'],
'revenue':[39.95,-39.95,199.95,299.95,499.95,-499.95,39.95,199.95,299.95,499.95,-499.95]
}
df = pd.DataFrame(data)
df['period'] = pd.to_datetime(df['period'],format='%Y-%m-%d')

这会产生一个像这样的数据框:

In [16]: df
Out[16]:
account_id billing_type company period revenue
0 111 subscription initech 2012-10-31 39.95
1 111 discount initech 2012-10-31 -39.95
2 222 subscription jackson steinem & co 2012-10-31 199.95
3 333 subscription ingen 2012-10-31 299.95
4 666 subscription enron 2012-10-31 499.95
5 666 discount enron 2012-10-31 -499.95
6 111 subscription initech 2012-11-30 39.95
7 222 subscription jackson steinem & co 2012-11-30 199.95
8 333 subscription ingen 2012-11-30 299.95
9 666 subscription enron 2012-11-30 499.95
10 666 discount enron 2012-11-30 -499.95

我需要做的是删除给定 company/periodrevenue 加起来为零的行。因此,例如,我需要删除 Enron 的所有行,但只删除 Initech 的 2012 年 10 月期间:

In [17]: df.groupby(['company','period'])['revenue'].sum()
Out[17]:
company period
enron 2012-10-31 0.00
2012-11-30 0.00
ingen 2012-10-31 299.95
2012-11-30 299.95
initech 2012-10-31 0.00
2012-11-30 39.95
jackson steinem & co 2012-10-31 199.95
2012-11-30 199.95

A numberother posts处理类似的案例,但我无法找到任何可以准确帮助/解释如何完成此请求的内容。

最佳答案

你可以使用 transform制作一个框架大小的蒙版,然后您可以使用它来选择:

>>> keep = df.groupby(["company", "period"])["revenue"].transform(sum) != 0
>>> df.loc[keep]
account_id billing_type company period revenue
2 222 subscription jackson steinem & co 2012-10-31 199.95
3 333 subscription ingen 2012-10-31 299.95
6 111 subscription initech 2012-11-30 39.95
7 222 subscription jackson steinem & co 2012-11-30 199.95
8 333 subscription ingen 2012-11-30 299.95

这是有效的,因为 transform 获取 groupby 结果并将其“广播”到主索引:

>>> df.groupby(["company", "period"])["revenue"].transform(sum)
0 0.00
1 0.00
2 199.95
3 299.95
4 0.00
5 0.00
6 39.95
7 199.95
8 299.95
9 0.00
10 0.00
dtype: float64
>>> df.groupby(["company", "period"])["revenue"].transform(sum) != 0
0 False
1 False
2 True
3 True
4 False
5 False
6 True
7 True
8 True
9 False
10 False
dtype: bool

关于python - 从数据框中删除总和为零的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35122184/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com