gpt4 book ai didi

python - 如何在多索引数据框中按第二级日期切片进行过滤

转载 作者:行者123 更新时间:2023-12-05 04:51:55 25 4
gpt4 key购买 nike

我有一个以日期作为二级索引的 DataFrame。如何在两个日期之间进行筛选?
下面是生成 DataFrame 的代码:

dates=pd.date_range(start='2015-01-01', end='2018-12-01', freq='M')
persons=['John','Paul','Susan','Steve','Anne','Carol']
miindex=pd.MultiIndex.from_product([persons, dates],
names=['persons', 'dates'])
df = pd.DataFrame(np.random.randn(282, 4), columns=list('ABCD'), index=miindex)

A B C D
persons dates
John 2015-01-31 -1.381854 0.438590 -1.838329 0.085944
2015-02-28 -1.870273 0.040513 1.116906 0.473218
2015-03-31 0.522960 -0.190412 -0.650339 -0.532672
2015-04-30 0.147605 -0.045129 1.209839 1.831272
2015-05-31 -0.331290 -0.413971 -2.418138 0.149583
... ... ... ... ... ...
Carol 2018-07-31 -0.344657 0.871752 -0.040436 0.132283
2018-08-31 0.168781 0.776657 -0.103212 -0.082286
2018-09-30 0.019738 0.151568 -0.794741 -1.316847
2018-10-31 -1.047699 0.913352 1.009840 0.070882
2018-11-30 -1.360346 -0.850818 -0.824563 0.305373

如何过滤具有以下日期的行:

  • 包含在 2016 年
  • 2015 年至 2017 年间
  • 从 2016 年 1 月 2 日开始
  • 从 01-01-2018 开始

例如,过滤 01-01-2018 和我应该得​​到的日期

                               A         B         C         D
persons dates
John 2018-01-31 1.092697 -0.534817 1.498770 -0.746335
2018-02-28 0.141443 0.286186 -0.652946 -0.331205
2018-03-31 -0.547728 0.942533 -0.315792 -1.564275
2018-04-30 2.383790 1.117817 -0.419611 1.603313
2018-05-31 0.405304 -1.468452 -0.713453 0.605490
... ... ... ...
Carol 2018-07-31 0.711990 0.615596 1.198836 2.283507
2018-08-31 -0.071486 -0.102290 -1.855148 0.284160
2018-09-30 1.461128 -1.163214 1.142434 0.183197
2018-10-31 -1.994097 -0.275098 0.877738 -1.094145
2018-11-30 0.225581 2.194110 0.160663 1.582566

请注意,您必须忽略输出中 A、B、C、D 列的值,因为我随机生成的 DataFrame 仅使用预期显示内容的索引。

最佳答案

partial string indexingMultiIndex 一起使用,但首先按 DataFrame.sort_index 排序:

df = df.sort_index()

idx = pd.IndexSlice
print (df.loc[idx[:, "2016"], :])
A B C D
persons dates
Anne 2016-01-31 1.189332 1.240492 1.948487 1.049944
2016-02-29 0.155651 0.172096 -1.315934 2.447474
2016-03-31 0.258901 1.052156 0.194412 0.551807
2016-04-30 0.817727 -0.039305 0.196576 -1.163072
2016-05-31 -0.379003 -0.640898 -0.412814 -0.507134
... ... ... ...
Susan 2016-08-31 0.944875 0.655981 -1.167568 1.087909
2016-09-30 -0.533770 0.271889 0.743089 -1.021702
2016-10-31 -0.548632 0.980111 1.288285 -1.130429
2016-11-30 0.843035 -1.019152 0.394127 0.375720
2016-12-31 0.789154 0.660676 -0.097020 -0.392890

[72 rows x 4 columns]

print (df.loc[idx[:, "2015":"2017"], :])
A B C D
persons dates
Anne 2015-01-31 0.340056 -0.084973 -0.160449 0.476274
2015-02-28 1.521403 2.075643 -0.089913 -3.556345
2015-03-31 1.871844 -1.933054 0.360196 -1.184768
2015-04-30 1.996072 -0.671001 1.001818 0.787014
2015-05-31 0.642655 -0.685923 -0.854484 -0.311828
... ... ... ...
Susan 2017-08-31 -0.349868 1.095051 0.950181 1.365780
2017-09-30 0.937602 0.456578 0.169026 -0.559212
2017-10-31 -0.404749 0.595979 -0.434110 2.312148
2017-11-30 1.381366 -1.470635 0.773891 -0.686727
2017-12-31 -0.611788 0.963277 0.564169 -0.647526

[216 rows x 4 columns]

print (df.loc[idx[:, "01-02-2016":], :])
A B C D
persons dates
Anne 2016-01-31 1.189332 1.240492 1.948487 1.049944
2016-02-29 0.155651 0.172096 -1.315934 2.447474
2016-03-31 0.258901 1.052156 0.194412 0.551807
2016-04-30 0.817727 -0.039305 0.196576 -1.163072
2016-05-31 -0.379003 -0.640898 -0.412814 -0.507134
... ... ... ...
Susan 2018-07-31 -0.180213 -0.613854 -0.143997 0.938364
2018-08-31 -1.232334 -1.066170 2.074717 -0.219996
2018-09-30 -0.014457 0.350130 -0.920580 0.040339
2018-10-31 1.651722 -0.399346 -1.647574 0.323075
2018-11-30 1.465342 0.182188 0.039446 -1.155651

[210 rows x 4 columns]

print (df.loc[idx[:, "01-01-2018":], :])
A B C D
persons dates
Anne 2018-01-31 0.072784 -0.093604 -0.896780 -0.336099
2018-02-28 -0.591907 -0.439462 -0.189500 0.172523
2018-03-31 0.027810 -0.932447 0.547707 -0.148938
2018-04-30 -0.114616 0.116554 -0.840459 -1.807368
2018-05-31 -0.017403 0.562685 0.157102 1.739236
... ... ... ...
Susan 2018-07-31 -0.180213 -0.613854 -0.143997 0.938364
2018-08-31 -1.232334 -1.066170 2.074717 -0.219996
2018-09-30 -0.014457 0.350130 -0.920580 0.040339
2018-10-31 1.651722 -0.399346 -1.647574 0.323075
2018-11-30 1.465342 0.182188 0.039446 -1.155651

[66 rows x 4 columns]

关于python - 如何在多索引数据框中按第二级日期切片进行过滤,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66681726/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com