gpt4 book ai didi

python - 如何在多列上重新采样并创建 value_counts() 和 count() 的时间序列?

转载 作者:行者123 更新时间:2023-12-05 04:46:22 25 4
gpt4 key购买 nike

我有以下数据框:

    Client_Id       Date    Age_Group      Gender
0 579427 2020-02-01 Under 65 Female
1 579464 2020-02-01 Under 65 Female
2 579440 2020-02-01 Under 65 Male
3 579470 2020-02-01 75 - 79 Female
4 579489 2020-02-01 75 - 79 Female
5 579424 2020-02-01 75 - 79 Male
6 579492 2020-02-01 75 - 79 Male
7 579552 2020-02-01 75 - 79 Male
8 579439 2020-02-01 80 - 84 Male
9 579445 2020-03-01 80 - 84 Female
10 579496 2020-03-01 80 - 84 Female
11 579569 2020-03-01 80 - 84 Male
12 579610 2020-03-01 80 - 84 Male
13 579450 2020-03-01 80 - 84 Female
14 579423 2020-03-01 85 and over Female
15 579428 2020-03-01 85 and over Male

我正在尝试重新采样,并获取 Client_Id 计数、Gender 计数和 Age_Group 计数的时间序列。

例如,我可以获得性别的value_counts:

df.set_index('Date').resample('D')['Gender'].value_counts()

Date Gender
2020-02-01 Male 5
Female 4
2020-03-01 Female 4
Male 3

我还可以获得 Age_Group 的 value_counts

我每天可以获得客户数量:

df.set_index('Date').resample('D')['Client_Id'].count()

Date
2020-01-02 9
2020-01-03 7

但是我希望所有输出都成为一个数据帧,并将值的结果作为它们自己的列。

我已经设法做到了,就像这样:

enter image description here

但是代码非常丑陋。我还有更多的列要处理,我不希望有这么长的 merge 链。

这就是我使用 unstackmerge 所做的:

(df.set_index('Date').resample('D')['Client_Id'].count().to_frame()
.merge(df.set_index('Date').resample('D')['Gender'].value_counts().unstack(), left_index=True, right_index=True)
.merge(df.set_index('Date').resample('D')['Age_Group'].value_counts().unstack(), left_index=True, right_index=True))

有没有更简单/更整洁/内置的方法来做到这一点?

我的数据框作为字典:

{'Client_Id': {0: 579427,
1: 579464,
2: 579440,
3: 579470,
4: 579489,
5: 579424,
6: 579492,
7: 579552,
8: 579439,
9: 579445,
10: 579496,
11: 579569,
12: 579610,
13: 579450,
14: 579423,
15: 579428},
'Date': {0: Timestamp('2020-01-02 00:00:00'),
1: Timestamp('2020-01-02 00:00:00'),
2: Timestamp('2020-01-02 00:00:00'),
3: Timestamp('2020-01-02 00:00:00'),
4: Timestamp('2020-01-02 00:00:00'),
5: Timestamp('2020-01-02 00:00:00'),
6: Timestamp('2020-01-02 00:00:00'),
7: Timestamp('2020-01-02 00:00:00'),
8: Timestamp('2020-01-02 00:00:00'),
9: Timestamp('2020-01-03 00:00:00'),
10: Timestamp('2020-01-03 00:00:00'),
11: Timestamp('2020-01-03 00:00:00'),
12: Timestamp('2020-01-03 00:00:00'),
13: Timestamp('2020-01-03 00:00:00'),
14: Timestamp('2020-01-03 00:00:00'),
15: Timestamp('2020-01-03 00:00:00')},
'Age_Group': {0: 'Under 65',
1: 'Under 65',
2: 'Under 65',
3: '75 - 79',
4: '75 - 79',
5: '75 - 79',
6: '75 - 79',
7: '75 - 79',
8: '80 - 84',
9: '80 - 84',
10: '80 - 84',
11: '80 - 84',
12: '80 - 84',
13: '80 - 84',
14: '85 and over',
15: '85 and over'},
'Gender': {0: 'Female ',
1: 'Female ',
2: 'Male ',
3: 'Female ',
4: 'Female ',
5: 'Male ',
6: 'Male ',
7: 'Male ',
8: 'Male ',
9: 'Female ',
10: 'Female ',
11: 'Male ',
12: 'Male ',
13: 'Female ',
14: 'Female ',
15: 'Male '}}

最佳答案

使用Series.unstack对于 df1 中的 DatetimeIndex,所以可以使用 concat :

df1 = df.set_index('Date').resample('D')['Gender'].value_counts().unstack()
df2 = df.set_index('Date').resample('D')['Client_Id'].count()
df = pd.concat([df1, df2], axis=1)

关于python - 如何在多列上重新采样并创建 value_counts() 和 count() 的时间序列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68845408/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com