gpt4 book ai didi

python - 展平多索引数据框

转载 作者:行者123 更新时间:2023-12-05 09:04:20 26 4
gpt4 key购买 nike

我有一个df:

df = pd.DataFrame.from_dict({('group', ''): {0: 'A',
1: 'A',
2: 'A',
3: 'A',
4: 'A',
5: 'A',
6: 'A',
7: 'A',
8: 'A',
9: 'B',
10: 'B',
11: 'B',
12: 'B',
13: 'B',
14: 'B',
15: 'B',
16: 'B',
17: 'B',
18: 'all',
19: 'all'},
('category', ''): {0: 'Amazon',
1: 'Apple',
2: 'Facebook',
3: 'Google',
4: 'Netflix',
5: 'Tesla',
6: 'Total',
7: 'Uber',
8: 'total',
9: 'Amazon',
10: 'Apple',
11: 'Facebook',
12: 'Google',
13: 'Netflix',
14: 'Tesla',
15: 'Total',
16: 'Uber',
17: 'total',
18: 'Total',
19: 'total'},
(pd.Timestamp('2020-06-29'), 'last_sales'): {0: 195.0,
1: 61.0,
2: 106.0,
3: 61.0,
4: 37.0,
5: 13.0,
6: 954.0,
7: 4.0,
8: 477.0,
9: 50.0,
10: 50.0,
11: 75.0,
12: 43.0,
13: 17.0,
14: 14.0,
15: 504.0,
16: 3.0,
17: 252.0,
18: 2916.0,
19: 2916.0},
(pd.Timestamp('2020-06-29'), 'sales'): {0: 1268.85,
1: 18274.385000000002,
2: 19722.65,
3: 55547.255,
4: 15323.800000000001,
5: 1688.6749999999997,
6: 227463.23,
7: 1906.0,
8: 113731.615,
9: 3219.6499999999996,
10: 15852.060000000001,
11: 17743.7,
12: 37795.15,
13: 5918.5,
14: 1708.75,
15: 166349.64,
16: 937.01,
17: 83174.82,
18: 787625.7400000001,
19: 787625.7400000001},
(pd.Timestamp('2020-06-29'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2020-07-06'), 'last_sales'): {0: 26.0,
1: 39.0,
2: 79.0,
3: 49.0,
4: 10.0,
5: 10.0,
6: 436.0,
7: 5.0,
8: 218.0,
9: 89.0,
10: 34.0,
11: 133.0,
12: 66.0,
13: 21.0,
14: 20.0,
15: 732.0,
16: 3.0,
17: 366.0,
18: 2336.0,
19: 2336.0},
(pd.Timestamp('2020-07-06'), 'sales'): {0: 3978.15,
1: 12138.96,
2: 19084.175,
3: 40033.46000000001,
4: 4280.15,
5: 1495.1,
6: 165548.29,
7: 1764.15,
8: 82774.145,
9: 8314.92,
10: 12776.649999999996,
11: 28048.075,
12: 55104.21000000002,
13: 6962.844999999999,
14: 3053.2000000000003,
15: 231049.11000000002,
16: 1264.655,
17: 115524.55500000001,
18: 793194.8000000002,
19: 793194.8000000002},
(pd.Timestamp('2020-07-06'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-06-28'), 'last_sales'): {0: 96.0,
1: 56.0,
2: 106.0,
3: 44.0,
4: 34.0,
5: 13.0,
6: 716.0,
7: 9.0,
8: 358.0,
9: 101.0,
10: 22.0,
11: 120.0,
12: 40.0,
13: 13.0,
14: 8.0,
15: 610.0,
16: 1.0,
17: 305.0,
18: 2652.0,
19: 2652.0},
(pd.Timestamp('2021-06-28'), 'sales'): {0: 5194.95,
1: 19102.219999999994,
2: 22796.420000000002,
3: 30853.115,
4: 11461.25,
5: 992.6,
6: 188143.41,
7: 3671.15,
8: 94071.705,
9: 6022.299999999998,
10: 7373.6,
11: 33514.0,
12: 35943.45,
13: 4749.000000000001,
14: 902.01,
15: 177707.32,
16: 349.3,
17: 88853.66,
18: 731701.46,
19: 731701.46},
(pd.Timestamp('2021-06-28'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-07-07'), 'last_sales'): {0: 45.0,
1: 47.0,
2: 87.0,
3: 45.0,
4: 13.0,
5: 8.0,
6: 494.0,
7: 2.0,
8: 247.0,
9: 81.0,
10: 36.0,
11: 143.0,
12: 56.0,
13: 9.0,
14: 9.0,
15: 670.0,
16: 1.0,
17: 335.0,
18: 2328.0,
19: 2328.0},
(pd.Timestamp('2021-07-07'), 'sales'): {0: 7556.414999999998,
1: 14985.05,
2: 16790.899999999998,
3: 36202.729999999996,
4: 4024.97,
5: 1034.45,
6: 163960.32999999996,
7: 1385.65,
8: 81980.16499999998,
9: 5600.544999999999,
10: 11209.92,
11: 32832.61,
12: 42137.44500000001,
13: 3885.1499999999996,
14: 1191.5,
15: 194912.34000000003,
16: 599.0,
17: 97456.17000000001,
18: 717745.3400000001,
19: 717745.3400000001},
(pd.Timestamp('2021-07-07'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0}}).set_index(['group','category'])

我正在尝试将其扁平化,这样它就不再是多索引 df。由于有多个日期,我尝试选择一个:

df.loc[:,'2020-06-29 00:00:00']

但这给了我一个错误:

KeyError: '2020-06-29 00:00:00'

我正在努力使 2020-06-29 的第一周(也是我的最终输出)看起来像这样:

group   category            last_sales  sales       difference                                              
A Amazon 195.00 1,268.85 0.00
A Apple 61.00 18,274.39 0.00
A Facebook 106.00 19,722.65 0.00
A Google 61.00 55,547.25 0.00
A Netflix 37.00 15,323.80 0.00
A Tesla 13.00 1,688.67 0.00
A Total 954.00 227,463.23 0.00
A Uber 4.00 1,906.00 0.00
A total 477.00 113,731.62 0.00
B Amazon 0.00 3,219.65 0.00
B Apple 50.00 15,852.06 0.00
B Facebook 75.00 17,743.70 0.00
B Google 43.00 37,795.15 0.00
B Netflix 17.00 5,918.50 0.00
B Tesla 14.00 1,708.75 0.00
B Total 504.00 166,349.64 0.00
B Uber 3.00 937.01 0.00
B total 252.00 83,174.82 0.00
all Total 2,916.00 787,625.74 0.00

最佳答案

尝试通过 pd.to_dateime():

out=df.loc[:,pd.to_datetime('2020-06-29 00:00:00')]
#out=df.loc[:,pd.to_datetime('2020-06-29 00:00:00')].reset_index()

尝试通过 pd.Timestamp()

out=df.loc[:,pd.Timestamp('2020-06-29 00:00:00')]
#out=df.loc[:,pd.Timestamp('2020-06-29 00:00:00')].reset_index()

您列的第 0 级是时间戳,您可以通过以下方式验证:

df.columns.to_numpy()

#output

array([(Timestamp('2020-06-29 00:00:00'), 'last_sales'),
(Timestamp('2020-06-29 00:00:00'), 'sales'),
(Timestamp('2020-06-29 00:00:00'), 'difference'),
(Timestamp('2020-07-06 00:00:00'), 'last_sales'),
(Timestamp('2020-07-06 00:00:00'), 'sales'),
(Timestamp('2020-07-06 00:00:00'), 'difference'),
(Timestamp('2021-06-28 00:00:00'), 'last_sales'),
(Timestamp('2021-06-28 00:00:00'), 'sales'),
(Timestamp('2021-06-28 00:00:00'), 'difference'),
(Timestamp('2021-07-07 00:00:00'), 'last_sales'),
(Timestamp('2021-07-07 00:00:00'), 'sales'),
(Timestamp('2021-07-07 00:00:00'), 'difference')], dtype=object)

out的输出:

                last_sales       sales  difference
group category
A Amazon 195.0 1268.850 0.0
Apple 61.0 18274.385 0.0
Facebook 106.0 19722.650 0.0
Google 61.0 55547.255 0.0
Netflix 37.0 15323.800 0.0
Tesla 13.0 1688.675 0.0
Total 954.0 227463.230 0.0
Uber 4.0 1906.000 0.0
total 477.0 113731.615 0.0
B Amazon 50.0 3219.650 0.0
Apple 50.0 15852.060 0.0
Facebook 75.0 17743.700 0.0
Google 43.0 37795.150 0.0
Netflix 17.0 5918.500 0.0
Tesla 14.0 1708.750 0.0
Total 504.0 166349.640 0.0
Uber 3.0 937.010 0.0
total 252.0 83174.820 0.0
all Total 2916.0 787625.740 0.0
total 2916.0 787625.740 0.0

注意:

不需要在 .loc[] 中提供元组,因为您选择的是第 0 级

关于python - 展平多索引数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68815461/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com