gpt4 book ai didi

python - 在日期为 "close"的级别上重新索引 MultiIndex

转载 作者:太空狗 更新时间:2023-10-30 00:58:14 24 4
gpt4 key购买 nike

问题

我有一个 pandas.Series 和一个两级 pandas.MultiIndex。第一层是日期。我有另一个 DatetimeIndex,其值接近我的 series.index.levels[0] 中的某些日期。我想用“其他”DatetimeIndex 中的日期重新索引我的系列,这些日期与索引中的现有日期足够接近。假设我所说的“关闭”是指 2 天内。

设置

import pandas as pd
import numpy as np

np.random.seed([3, 1415])

chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

# Equal Date + 3 Days - 1 Day + 2 Days
i0 = pd.to_datetime(
[ '2018-11-30', '2018-12-16', '2018-12-30', '2019-01-17' ])
i1 = pd.to_datetime(
['2018-10-31', '2018-11-30', '2018-12-13', '2018-12-31', '2019-01-15', '2019-01-31'])
# Include Skip Include Include

lvl0 = i0.repeat(5)
lvl1 = np.concatenate(
[np.random.choice([*chars], size=5, replace=False) for _ in range(4)])

midx = pd.MultiIndex.from_tuples([*zip(lvl0, lvl1)], names=['date', 'ID'])

s0 = pd.Series(np.arange(4).repeat(5), midx, name='stuff')

s0

date ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-16 Q 1
B 1
A 1
S 1
P 1
2018-12-30 U 2
S 2
A 2
J 2
L 2
2019-01-17 K 3
U 3
V 3
S 3
H 3
Name: stuff, dtype: int64

我想要的是

注意:与原来相同的dtype

date        ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-31 U 2
S 2
A 2
J 2
L 2
2019-01-15 K 3
U 3
V 3
S 3
H 3
Name: stuff, dtype: int64

我做了什么

tol = pd.Timedelta('2D')

# 0. This should be the same as the `i0` I used to set up
# But supposing that wasn't available, we would...
i0 = s0.index.levels[0]

# 1. Broadcast date differences
# 2. Take the absolute value
# 3. Find the position of minimum absolute value for each row
# 4. Define a proposal of new index level values with those positions
i_proposal = i1[np.abs(np.subtract.outer(i0, i1)).argmin(1)]

# 5. Use proposal to get which ones are within the
# tolerance of 2 days
i_final = i_proposal[np.abs(i_proposal - i0) <= tol]

# 6. set_levels with proposal.
# because at this point there is a one-to-one correspondance
s0.index.set_levels(i_proposal, level=0, inplace=True)

# 7. use `loc` to pull out the final ones
s0.loc[i_final]

date        ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-31 U 2
S 2
A 2
J 2
L 2
2019-01-15 K 3
U 3
V 3
S 3
H 3
Name: stuff, dtype: int64

我的解决方案有问题

  1. 这是“平滑”的反义词
  2. i0.index 上操作 inplace
  3. 大 O(len(i0) * len(i1))。应该有一个 Big-O(len(i0) + len(i1)) 解决方案。

有人能想出更好的方法吗?

最佳答案

这与 cs95 通过使用 reindex 所做的非常接近

s,y=i1.reindex(s0.index.levels[0],tolerance=pd.Timedelta(days=2),method='nearest')

s0.loc[s[y!=-1]]

如果需要将索引level1改为l1

s=s0.index.levels[0].values
t=abs((i1[:,None]-s))/np.timedelta64(1, 'D')<=2

f=s0.loc[s[t.any(0)]].reset_index(level=1)

f.index=f.index.map(dict(zip(s[t.any(0)],i1[t.any(1)])))
f.set_index('ID',append=True,inplace=True)
f
Out[458]:
stuff
date ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-31 U 2
S 2
A 2
J 2
L 2
2019-01-15 K 3
U 3
V 3
S 3
H 3

piR 编辑

我是这样改的

lvl0, lvl1 = s0.index.levels
_, indexer = i1.reindex(lvl0, tolerance=tol, method='nearest')
newlvl0 = i1[indexer]
msklvl0 = newlvl0[indexer != -1]

newidx = s0.index.set_levels([newlvl0, lvl1])
s0.set_axis(newidx, inplace=False).loc[msklvl0]

date ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-31 U 2
S 2
A 2
J 2
L 2
2019-01-15 K 3
U 3
V 3
S 3
H 3
Name: stuff, dtype: int64

关于python - 在日期为 "close"的级别上重新索引 MultiIndex,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56781728/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com