gpt4 book ai didi

python - 根据日期范围合并数据框

转载 作者:太空狗 更新时间:2023-10-30 00:31:46 26 4
gpt4 key购买 nike

我有两个 pandas 数据框:一个 (df1) 三列(StartDateEndDateID) 和带有日期的第二个 (df2)。我想根据 df1.StartDatedf2.EndDate 之间的 df2.Date 合并 df1df2

df1 中的每个日期范围都是唯一的,并且不与数据框中的任何其他行重叠。

日期格式为 YYYY-MM-DD

最佳答案

只是为了提供一种使用 np.piecewise 的替代方法。性能甚至比 np.searchedsort 更快。​​

import pandas as pd
import numpy as np

# data
# ====================================
df1 = pd.DataFrame({'StartDate': pd.date_range('2010-01-01', periods=9, freq='5D'), 'EndDate': pd.date_range('2010-01-04', periods=9, freq='5D'), 'ID': np.arange(1, 10, 1)})

df2 = pd.DataFrame(dict(values=np.random.randn(50), date_time=pd.date_range('2010-01-01', periods=50, freq='D')))

df1.StartDate

Out[139]:
0 2010-01-01
1 2010-01-06
2 2010-01-11
3 2010-01-16
4 2010-01-21
5 2010-01-26
6 2010-01-31
7 2010-02-05
8 2010-02-10
Name: StartDate, dtype: datetime64[ns]

df2.date_time

Out[140]:
0 2010-01-01
1 2010-01-02
2 2010-01-03
3 2010-01-04
4 2010-01-05
5 2010-01-06
6 2010-01-07
7 2010-01-08
8 2010-01-09
9 2010-01-10
...
40 2010-02-10
41 2010-02-11
42 2010-02-12
43 2010-02-13
44 2010-02-14
45 2010-02-15
46 2010-02-16
47 2010-02-17
48 2010-02-18
49 2010-02-19
Name: date_time, dtype: datetime64[ns]


df2['ID_matched'] = np.piecewise(np.zeros(len(df2)), [(df2.date_time.values >= start_date)&(df2.date_time.values <= end_date) for start_date, end_date in zip(df1.StartDate.values, df1.EndDate.values)], df1.ID.values)


Out[143]:
date_time values ID_matched
0 2010-01-01 -0.2240 1
1 2010-01-02 -0.4202 1
2 2010-01-03 0.9998 1
3 2010-01-04 0.4310 1
4 2010-01-05 -0.6509 0
5 2010-01-06 -1.4987 2
6 2010-01-07 -1.2306 2
7 2010-01-08 0.1940 2
8 2010-01-09 -0.9984 2
9 2010-01-10 -0.3676 0
.. ... ... ...
40 2010-02-10 0.5242 9
41 2010-02-11 0.3451 9
42 2010-02-12 0.7244 9
43 2010-02-13 -2.0404 9
44 2010-02-14 -1.0798 0
45 2010-02-15 -0.6934 0
46 2010-02-16 -2.3380 0
47 2010-02-17 1.6623 0
48 2010-02-18 -0.2754 0
49 2010-02-19 -0.7466 0

[50 rows x 3 columns]

%timeit df2['ID_matched'] = np.piecewise(np.zeros(len(df2)), [(df2.date_time.values >= start_date)&(df2.date_time.values <= end_date) for start_date, end_date in zip(df1.StartDate.values, df1.EndDate.values)], df1.ID.values)
1000 loops, best of 3: 466 µs per loop

关于python - 根据日期范围合并数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31328014/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com