gpt4 book ai didi

python - 如何加快 Pandas 中日期的填写速度?

转载 作者:太空宇宙 更新时间:2023-11-04 04:27:44 25 4
gpt4 key购买 nike

我有一个包含四列的数据框:姓名、帐户、日期和积分

我需要按姓名和账户分组,然后用前几天的积分补上缺失的日期。

我知道该怎么做,但我不知道如何快速完成。我的实际数据框有数百万行。

这是问题的简化版本。我想获得相同的输出,但在填写大量数据时要快得多。

(实际数据来自Excel文件。)

import pandas as pd

data = """
name account date points
Steve e12 2014-02-07 17
Steve e12 2014-02-09 18

Steve g52 2014-02-03 52
Steve g52 2014-02-06 25
Steve g52 2014-02-08 31
Steve g52 2014-02-09 40

Fred g21 2014-02-02 17
Fred g21 2014-02-08 19

Fred g52 2014-02-07 21
Fred g52 2014-02-09 18
"""

dates = pd.date_range("2014-02-01", "2014-02-10")

def fill_in_dates(part_df):
part_df.index = pd.DatetimeIndex(part_df.date)
part_df = part_df.reindex(dates)
part_df = part_df.fillna(method='ffill')
return part_df

lines = [line.strip().split() for line in data.splitlines()[2:] if line.strip()]
columns = data.splitlines()[1].split()
df = pd.DataFrame(lines, columns=columns)

df = df.groupby(['name', 'account'], as_index=False).apply(fill_in_dates)

df = df.dropna()
df = df.reset_index()
df.date = df.level_1
df = df.drop(['level_0', 'level_1'], axis=1)

print(df)

这是输出:

     name account       date points
0 Fred g21 2014-02-02 17
1 Fred g21 2014-02-03 17
2 Fred g21 2014-02-04 17
3 Fred g21 2014-02-05 17
4 Fred g21 2014-02-06 17
5 Fred g21 2014-02-07 17
6 Fred g21 2014-02-08 19
7 Fred g21 2014-02-09 19
8 Fred g21 2014-02-10 19
9 Fred g52 2014-02-07 21
10 Fred g52 2014-02-08 21
11 Fred g52 2014-02-09 18
12 Fred g52 2014-02-10 18
13 Steve e12 2014-02-07 17
14 Steve e12 2014-02-08 17
15 Steve e12 2014-02-09 18
16 Steve e12 2014-02-10 18
17 Steve g52 2014-02-03 52
18 Steve g52 2014-02-04 52
19 Steve g52 2014-02-05 52
20 Steve g52 2014-02-06 25
21 Steve g52 2014-02-07 25
22 Steve g52 2014-02-08 31
23 Steve g52 2014-02-09 40
24 Steve g52 2014-02-10 40

最佳答案

我认为您唯一的选择是在日期范围内调用 groupbyreindex:

def reindex(g):
return g.reindex(pd.date_range(g.index.min(), g.index.max()))

df['date'] = pd.to_datetime(df['date'], errors='coerce')
(df.set_index('date')
.groupby(['name', 'account'])
.points.apply(reindex)
.ffill()
.rename_axis(['name', 'account', 'date'])
.reset_index())

name account date points
0 Fred g21 2014-02-02 17
1 Fred g21 2014-02-03 17
2 Fred g21 2014-02-04 17
3 Fred g21 2014-02-05 17
4 Fred g21 2014-02-06 17
5 Fred g21 2014-02-07 17
6 Fred g21 2014-02-08 19
7 Fred g52 2014-02-07 21
8 Fred g52 2014-02-08 21
9 Fred g52 2014-02-09 18
10 Steve e12 2014-02-07 17
11 Steve e12 2014-02-08 17
12 Steve e12 2014-02-09 18
13 Steve g52 2014-02-03 52
14 Steve g52 2014-02-04 52
15 Steve g52 2014-02-05 52
16 Steve g52 2014-02-06 25
17 Steve g52 2014-02-07 25
18 Steve g52 2014-02-08 31
19 Steve g52 2014-02-09 40

关于python - 如何加快 Pandas 中日期的填写速度?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53216215/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com