gpt4 book ai didi

python - 有效比较 Pandas Dataframe 中的行数据

转载 作者:行者123 更新时间:2023-12-01 03:58:33 26 4
gpt4 key购买 nike

我有一个 CSV 文件,其中包含每月手机账单(没有特定的顺序),我将其读入 Pandas Dataframe 中。我想为每个帐单添加一列,显示该帐单与同一帐户的上一个帐单有何不同。此 CSV 只是我的数据的子集。我的代码工作正常,但当您查看接近一百万行的 CSV 文件时,代码非常草率且非常慢。

我应该做什么来提高效率?

CSV:

Account Number,Bill Month,Bill Amount
4543,3/1/2015,300
4543,1/1/2015,100
4543,2/1/2015,200
2322,1/1/2015,22
2322,3/1/2015,38
2322,2/1/2015,25

Python:

import numpy as np
import pandas as pd
data = pd.read_csv('data.csv', low_memory=False)

# sort my data and reset the index so I can use index and index - 1 in the loop
data = data.sort_values(by=['Account Number', 'Bill Month'])
data = data.reset_index(drop=True)

# add a blank column for the difference
data['Difference'] = np.nan

for index, row in data.iterrows():

# special handling for the first row so I don't get negative indexes
if index == 0:
data.ix[index, 'Difference'] = "-"
else:
# if the account in the current row and the row before are the same, then compare Bill Amounts
if data.ix[index, 'Account Number'] == data.ix[index - 1, 'Account Number']:
data.ix[index, 'Difference'] = data.ix[index, 'Bill Amount'] - data.ix[index - 1, 'Bill Amount']
else:
data.ix[index, 'Difference'] = "-"

print data

期望的输出:

   Account Number Bill Month  Bill Amount Difference
0 2322 1/1/2015 22 -
1 2322 2/1/2015 25 3
2 2322 3/1/2015 38 13
3 4543 1/1/2015 100 -
4 4543 2/1/2015 200 100
5 4543 3/1/2015 300 100

最佳答案

试试这个:

In [37]: df = df.sort_values(['Account Number','Bill Month'])

In [38]: df['Difference'] = (df.groupby(['Account Number'])['Bill Amount']
....: .diff()
....: .fillna('-')
....: )

In [39]: df
Out[39]:
Account Number Bill Month Bill Amount Difference
3 2322 2015-01-01 22 -
5 2322 2015-02-01 25 3
4 2322 2015-03-01 38 13
1 4543 2015-01-01 100 -
2 4543 2015-02-01 200 100
0 4543 2015-03-01 300 100

说明:

diff() 将分别应用于每个组 - 它将返回“下一个”值与当前值之间的差异:

In [123]: df.groupby(['Account Number'])['Bill Amount'].diff()
Out[123]:
3 NaN
5 3.0
4 13.0
1 NaN
2 100.0
0 100.0
dtype: float64

fillna('-') - 用指定值填充所有 NaN:-:

In [124]: df.groupby(['Account Number'])['Bill Amount'].diff().fillna('-')
Out[124]:
3 -
5 3
4 13
1 -
2 100
0 100
dtype: object

关于python - 有效比较 Pandas Dataframe 中的行数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37014715/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com