gpt4 book ai didi

python - 每组内累计运行百分比和每组降序排列python

转载 作者:太空宇宙 更新时间:2023-11-04 04:35:43 26 4
gpt4 key购买 nike

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,'id': 
[1,2,3,4,5,6]*2 ,'sales': [np.random.randint(100000, 999999) for _ in
range(12)]})

这是 df 的输出:

    id    sales    state
0 1 847754 CA
1 2 362532 WA
2 3 615849 CO
3 4 376480 AZ
4 5 381286 CA
5 6 411001 WA
6 1 946795 CO
7 2 857435 AZ
8 3 928087 CA
9 4 675593 WA
10 5 371339 CO
11 6 440285 AZ

我无法按降序计算每个组的累计百分比。我想要这样的输出:

    id    sales   state  cumsum      run_pct
0 2 857435 AZ 857435 0.5121460996296738
1 6 440285 AZ 1297720 0.7751284195436626
2 4 376480 AZ 1674200 1.0
3 3 928087 CA 928087 0.43024216932985404
4 1 847754 CA 1775841 0.8232436013271356
5 5 381286 CA 2157127 1.0
6 1 946795 CO 946795 0.48955704367618535
7 3 615849 CO 1562644 0.807992624547372
8 5 371339 CO 1933983 1.0
9 4 675593 WA 675593 0.46620721731581655
10 6 411001 WA 1086594 0.7498271371847582
11 2 362532 WA 1449126 1.0

最佳答案

一个可能的解决方案是首先对数据进行排序,计算 cumsum,然后计算百分比。按状态升序和销售额降序排序:

df = df.sort_values(['state', 'sales'], ascending=[True, False])

计算累积和:

df['cumsum'] = df.groupby('state')['sales'].cumsum()

和百分比:

df['run_pct'] = df.groupby('state')['sales'].apply(lambda x: (x/x.sum()).cumsum())

这将给出:

    id  sales   state   cumsum  run_pct
0 4 846079 AZ 846079 0.608566
1 2 312708 AZ 1158787 0.833491
2 6 231495 AZ 1390282 1.000000
3 3 790291 CA 790291 0.506795
4 1 554631 CA 1344922 0.862467
5 5 214467 CA 1559389 1.000000
6 1 983878 CO 983878 0.388139
7 5 779497 CO 1763375 0.695650
8 3 771486 CO 2534861 1.000000
9 6 794407 WA 794407 0.420899
10 2 587843 WA 1382250 0.732355
11 4 505155 WA 1887405 1.000000

关于python - 每组内累计运行百分比和每组降序排列python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51778892/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com