gpt4 book ai didi

python - 带有 .loc 的多索引数据帧中的数学运算

转载 作者:行者123 更新时间:2023-12-04 14:54:04 25 4
gpt4 key购买 nike

我有一个多索引df:

                    created_at          2020-06-29                                       2020-07-06
sales orders last_sales differennce sales orders last_sales differennce

group category 10 10 10 0 10 10 20 50
A a1
a2
a3
B a1 ... ... ... ... ... ... ...
a2
a3
all Total 100 100 100 0 150 150 150 0

我正在尝试计算 all & Total 索引中的 difference,即 ((sales/last_sales) -1) * 100 其中created_at2020-06-29 到 2021-07-19

我试过:

df.loc[('all','Total'),(slice(None),'difference')] = 
((df.loc[('all','Total'),(slice(None),'sales')] /
df.loc[('all','Total'),:'2021-07-19','last_sales']) - 1) * 100

但是我得到一个错误:

IndexError: list index out of range

我似乎无法选择我想要的 created_at 范围,但这有效:

df.loc[:,:'2021-07-19'] # returning the dates I want, from first to selected

但是当我尝试选择它中断的一行时:

df_out.loc[:,(:'2021-07-19','sales')] # SyntaxError: invalid syntax

&

df_out.loc[:,:'2021-07-19','sales'] # IndexError: list index out of range

我如何选择 group = allcategory = Total 和日期从 2020-06-29 到 2020-07- 的所有行19?

我也试过:

df.loc[('all','Total'),(slice(None),'difference')] = 
((df.loc[('all','Total'),(slice(None),'sales')] /
df.loc[('all','Total'),(slice(None),'last_sales')]) - 1) * 100 # slice(None) instead of date

但这会给我带来 ZeroDivisionError: float division by zero 因为 2021-07-19 以上的日期有 last_sales = 0。也许还有其他处理 0 的方法?我尝试像这样添加 1 df.loc[('all','Total'),(slice(None),'last_sales')]+1) - 1) * 100 但我仍然遇到同样的错误。

示例df:

df = pd.DataFrame.from_dict({('group', ''): {0: 'A',
1: 'A',
2: 'A',
3: 'A',
4: 'A',
5: 'A',
6: 'A',
7: 'A',
8: 'A',
9: 'B',
10: 'B',
11: 'B',
12: 'B',
13: 'B',
14: 'B',
15: 'B',
16: 'B',
17: 'B',
18: 'all',
19: 'all'},
('category', ''): {0: 'Amazon',
1: 'Apple',
2: 'Facebook',
3: 'Google',
4: 'Netflix',
5: 'Tesla',
6: 'Total',
7: 'Uber',
8: 'total',
9: 'Amazon',
10: 'Apple',
11: 'Facebook',
12: 'Google',
13: 'Netflix',
14: 'Tesla',
15: 'Total',
16: 'Uber',
17: 'total',
18: 'Total',
19: 'total'},
(pd.Timestamp('2020-06-29 00:00:00'), 'last_sales'): {0: 195.0,
1: 61.0,
2: 106.0,
3: 61.0,
4: 37.0,
5: 13.0,
6: 954.0,
7: 4.0,
8: 477.0,
9: 50.0,
10: 50.0,
11: 75.0,
12: 43.0,
13: 17.0,
14: 14.0,
15: 504.0,
16: 3.0,
17: 252.0,
18: 2916.0,
19: 2916.0},
(pd.Timestamp('2020-06-29 00:00:00'), 'sales'): {0: 1268.85,
1: 18274.385000000002,
2: 19722.65,
3: 55547.255,
4: 15323.800000000001,
5: 1688.6749999999997,
6: 227463.23,
7: 1906.0,
8: 113731.615,
9: 3219.6499999999996,
10: 15852.060000000001,
11: 17743.7,
12: 37795.15,
13: 5918.5,
14: 1708.75,
15: 166349.64,
16: 937.01,
17: 83174.82,
18: 787625.7400000001,
19: 787625.7400000001},
(pd.Timestamp('2020-06-29 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2020-07-06 00:00:00'), 'last_sales'): {0: 26.0,
1: 39.0,
2: 79.0,
3: 49.0,
4: 10.0,
5: 10.0,
6: 436.0,
7: 5.0,
8: 218.0,
9: 89.0,
10: 34.0,
11: 133.0,
12: 66.0,
13: 21.0,
14: 20.0,
15: 732.0,
16: 3.0,
17: 366.0,
18: 2336.0,
19: 2336.0},
(pd.Timestamp('2020-07-06 00:00:00'), 'sales'): {0: 3978.15,
1: 12138.96,
2: 19084.175,
3: 40033.46000000001,
4: 4280.15,
5: 1495.1,
6: 165548.29,
7: 1764.15,
8: 82774.145,
9: 8314.92,
10: 12776.649999999996,
11: 28048.075,
12: 55104.21000000002,
13: 6962.844999999999,
14: 3053.2000000000003,
15: 231049.11000000002,
16: 1264.655,
17: 115524.55500000001,
18: 793194.8000000002,
19: 793194.8000000002},
(pd.Timestamp('2020-07-06 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-06-28 00:00:00'), 'last_sales'): {0: 96.0,
1: 56.0,
2: 106.0,
3: 44.0,
4: 34.0,
5: 13.0,
6: 716.0,
7: 9.0,
8: 358.0,
9: 101.0,
10: 22.0,
11: 120.0,
12: 40.0,
13: 13.0,
14: 8.0,
15: 610.0,
16: 1.0,
17: 305.0,
18: 2652.0,
19: 2652.0},
(pd.Timestamp('2021-06-28 00:00:00'), 'sales'): {0: 5194.95,
1: 19102.219999999994,
2: 22796.420000000002,
3: 30853.115,
4: 11461.25,
5: 992.6,
6: 188143.41,
7: 3671.15,
8: 94071.705,
9: 6022.299999999998,
10: 7373.6,
11: 33514.0,
12: 35943.45,
13: 4749.000000000001,
14: 902.01,
15: 177707.32,
16: 349.3,
17: 88853.66,
18: 731701.46,
19: 731701.46},
(pd.Timestamp('2021-06-28 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-07-07 00:00:00'), 'last_sales'): {0: 45.0,
1: 47.0,
2: 87.0,
3: 45.0,
4: 13.0,
5: 8.0,
6: 494.0,
7: 2.0,
8: 247.0,
9: 81.0,
10: 36.0,
11: 143.0,
12: 56.0,
13: 9.0,
14: 9.0,
15: 670.0,
16: 1.0,
17: 335.0,
18: 2328.0,
19: 2328.0},
(pd.Timestamp('2021-07-07 00:00:00'), 'sales'): {0: 7556.414999999998,
1: 14985.05,
2: 16790.899999999998,
3: 36202.729999999996,
4: 4024.97,
5: 1034.45,
6: 163960.32999999996,
7: 1385.65,
8: 81980.16499999998,
9: 5600.544999999999,
10: 11209.92,
11: 32832.61,
12: 42137.44500000001,
13: 3885.1499999999996,
14: 1191.5,
15: 194912.34000000003,
16: 599.0,
17: 97456.17000000001,
18: 717745.3400000001,
19: 717745.3400000001},
(pd.Timestamp('2021-07-07 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0}}).set_index(['group','category'])

最佳答案

我们可以使用 pd.IndexSlice 并且我们需要在 pd.Timestamp 上切片,因为您的第一级列是 datetime 类型:

idx = pd.IndexSlice
sales = df.loc[('all','Total'),(slice(None),'sales')].droplevel(level=1)
last_sales = df.loc[('all','Total'), :pd.Timestamp("2021-07-07")].loc[idx[:, "last_sales"]]

df.loc[('all','Total'),(slice(None),'difference')] = sales.div(last_sales).sub(1).mul(100).to_numpy()
               2020-06-29 00:00:00                      2020-07-06 00:00:00                      2021-06-28 00:00:00                      2021-07-07 00:00:00                     
last_sales sales difference last_sales sales difference last_sales sales difference last_sales sales difference
group category
A Amazon 195.00 1268.85 0.00 26.00 3978.15 0.00 96.00 5194.95 0.00 45.00 7556.41 0.00
Apple 61.00 18274.39 0.00 39.00 12138.96 0.00 56.00 19102.22 0.00 47.00 14985.05 0.00
Facebook 106.00 19722.65 0.00 79.00 19084.17 0.00 106.00 22796.42 0.00 87.00 16790.90 0.00
Google 61.00 55547.25 0.00 49.00 40033.46 0.00 44.00 30853.12 0.00 45.00 36202.73 0.00
Netflix 37.00 15323.80 0.00 10.00 4280.15 0.00 34.00 11461.25 0.00 13.00 4024.97 0.00
Tesla 13.00 1688.67 0.00 10.00 1495.10 0.00 13.00 992.60 0.00 8.00 1034.45 0.00
Total 954.00 227463.23 0.00 436.00 165548.29 0.00 716.00 188143.41 0.00 494.00 163960.33 0.00
Uber 4.00 1906.00 0.00 5.00 1764.15 0.00 9.00 3671.15 0.00 2.00 1385.65 0.00
total 477.00 113731.62 0.00 218.00 82774.15 0.00 358.00 94071.71 0.00 247.00 81980.16 0.00
B Amazon 50.00 3219.65 0.00 89.00 8314.92 0.00 101.00 6022.30 0.00 81.00 5600.54 0.00
Apple 50.00 15852.06 0.00 34.00 12776.65 0.00 22.00 7373.60 0.00 36.00 11209.92 0.00
Facebook 75.00 17743.70 0.00 133.00 28048.08 0.00 120.00 33514.00 0.00 143.00 32832.61 0.00
Google 43.00 37795.15 0.00 66.00 55104.21 0.00 40.00 35943.45 0.00 56.00 42137.45 0.00
Netflix 17.00 5918.50 0.00 21.00 6962.84 0.00 13.00 4749.00 0.00 9.00 3885.15 0.00
Tesla 14.00 1708.75 0.00 20.00 3053.20 0.00 8.00 902.01 0.00 9.00 1191.50 0.00
Total 504.00 166349.64 0.00 732.00 231049.11 0.00 610.00 177707.32 0.00 670.00 194912.34 0.00
Uber 3.00 937.01 0.00 3.00 1264.65 0.00 1.00 349.30 0.00 1.00 599.00 0.00
total 252.00 83174.82 0.00 366.00 115524.56 0.00 305.00 88853.66 0.00 335.00 97456.17 0.00
all Total 2916.00 787625.74 26910.48 2336.00 793194.80 33855.26 2652.00 731701.46 27490.55 2328.00 717745.34 30730.99
total 2916.00 787625.74 0.00 2336.00 793194.80 0.00 2652.00 731701.46 0.00 2328.00 717745.34 0.00

关于python - 带有 .loc 的多索引数据帧中的数学运算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68481367/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com