gpt4 book ai didi

python - 对 Pandas DataFrame 组内的 int 系列进行上采样

转载 作者:太空宇宙 更新时间:2023-11-03 14:17:48 26 4
gpt4 key购买 nike

我的问题是关于如何为数据框中的每个多个“分组”对 int 系列进行上采样。 (在我的例子中,对于每个“团队”和“LeadWeek”分组)。

我看到内置函数和许多用于对时间序列进行上采样的示例,但没有对整数进行上采样。由于各种原因,我现在不会进入,我想用整数而不是时间序列来完成此操作。

就我而言,我有“Teams”和“LeadWeeks”,并且我想将每个“Team”和“LeadWeek”组合的“Conversion Weeks”上采样为 [0, 1, 2, 3, 4]。

我认为有一种方法可以使用 multi-index/groupby + resample() 来做到这一点,但我不够聪明经过几个小时的修补后弄清楚。在此向各位高人求助...

这是示例数据框:

df = pd.DataFrame([
['Team A', pd.datetime(2017, 12, 1), 0, 2]
,['Team A', pd.datetime(2017, 12, 1), 2, 1]
,['Team A', pd.datetime(2017, 12, 1), 4, 1]
,['Team A', pd.datetime(2017, 12, 8), 3, 2]
,['Team B', pd.datetime(2017, 12, 1), 0, 1]
,['Team B', pd.datetime(2017, 12, 1), 2, 3]
,['Team B', pd.datetime(2017, 12, 8), 1, 3]
,['Team B', pd.datetime(2017, 12, 8), 3, 2]
]
, columns=['Team', 'LeadWeek', 'ConversionWeek', 'Conversions']
)

我想要的输出如下,每个 Team/LeadWeek 分组都有 5 个“ConversionWeek”行,编号为 0 到 4:

       Team     LeadWeek     ConversionWeek     Conversions
0 Team A 2017-12-01 0 2.0
1 Team A 2017-12-01 1 0.0
2 Team A 2017-12-01 2 1.0
3 Team A 2017-12-01 3 0.0
4 Team A 2017-12-01 4 1.0
5 Team A 2017-12-08 0 0.0
6 Team A 2017-12-08 1 0.0
7 Team A 2017-12-08 2 0.0
8 Team A 2017-12-08 3 2.0
9 Team A 2017-12-08 4 0.0
10 Team B 2017-12-01 0 1.0
11 Team B 2017-12-01 1 0.0
12 Team B 2017-12-01 2 3.0
13 Team B 2017-12-01 3 0.0
14 Team B 2017-12-01 4 0.0
15 Team B 2017-12-08 0 0.0
16 Team B 2017-12-08 1 3.0
17 Team B 2017-12-08 2 0.0
18 Team B 2017-12-08 3 2.0
19 Team B 2017-12-08 4 0.0

我确实有一个解决方案,但它不是很Pythonic。这与我在 SQL 中解决它的方式相同,即使用所有不同元素的笛卡尔积创建一个“支架”,然后将我的实际转换加入其中。在 Python 中,此方法使用 itertools.product()

我的解决方案是这样的:

import pandas as pd
import numpy as np
import itertools as it

df = pd.DataFrame([
['Team A', pd.datetime(2017, 12, 1), 0, 2]
,['Team A', pd.datetime(2017, 12, 1), 2, 1]
,['Team A', pd.datetime(2017, 12, 1), 4, 1]
,['Team A', pd.datetime(2017, 12, 8), 3, 2]
,['Team B', pd.datetime(2017, 12, 1), 0, 1]
,['Team B', pd.datetime(2017, 12, 1), 2, 3]
,['Team B', pd.datetime(2017, 12, 8), 1, 3]
,['Team B', pd.datetime(2017, 12, 8), 3, 2]
]
, columns=['Team', 'LeadWeek', 'ConversionWeek', 'Conversions']
)

ConversionWeek = np.linspace(0, 4, 5, dtype=int)

Team = df['Team'].unique()

LeadWeek = df['LeadWeek'].unique()

scaffold_raw = []

for i in it.product(Team, LeadWeek, ConversionWeek):
scaffold_raw.append(i)

scaffold = pd.DataFrame(scaffold_raw, columns=['Team', 'LeadWeek', 'ConversionWeek'])

new_frame = scaffold.merge(df, how='left')

new_frame = new_frame.sort_values(by=['Team', 'LeadWeek', 'ConversionWeek']).reset_index(drop=True)

new_frame['Conversions'].fillna(0, inplace=True)

感谢您对更优雅的解决方案的任何帮助。

最佳答案

通过传递pd.MultiIndex来使用reindex -

idx = pd.MultiIndex.from_product(
[df.Team.unique(), df.LeadWeek.unique(), np.arange(5)]
)

v = df.set_index(['Team', 'LeadWeek', 'ConversionWeek'])\
.reindex(idx)\
.fillna(0)\
.reset_index()

v.columns = df.columns
v

Team LeadWeek ConversionWeek Conversions
0 Team A 2017-12-01 0 2.0
1 Team A 2017-12-01 1 0.0
2 Team A 2017-12-01 2 1.0
3 Team A 2017-12-01 3 0.0
4 Team A 2017-12-01 4 1.0
5 Team A 2017-12-08 0 0.0
6 Team A 2017-12-08 1 0.0
7 Team A 2017-12-08 2 0.0
8 Team A 2017-12-08 3 2.0
9 Team A 2017-12-08 4 0.0
10 Team B 2017-12-01 0 1.0
11 Team B 2017-12-01 1 0.0
12 Team B 2017-12-01 2 3.0
13 Team B 2017-12-01 3 0.0
14 Team B 2017-12-01 4 0.0
15 Team B 2017-12-08 0 0.0
16 Team B 2017-12-08 1 3.0
17 Team B 2017-12-08 2 0.0
18 Team B 2017-12-08 3 2.0
19 Team B 2017-12-08 4 0.0

关于python - 对 Pandas DataFrame 组内的 int 系列进行上采样,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48142528/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com