gpt4 book ai didi

python - Pandas 根据时间条件连接两个表

转载 作者:太空宇宙 更新时间:2023-11-03 16:06:34 26 4
gpt4 key购买 nike

我正在尝试根据时间戳连接 Pandas 中的两个表。基本上结构看起来像这样:

表2

Timestamp          Truck     MineX           MineY
2016-08-27 01:10 CT77 -11346.36655 -650404.405
2016-08-27 01:12 CT45 -11596.88137 -648294.056
2016-08-27 01:13 CT67 -11953.16118 -648325.114
2016-08-27 01:13 CT75 -11326.54075 -650447.462
2016-08-27 01:14 CT79 -11380.27834 -650425.968
2016-08-27 01:15 CT26 -9493.153286 -652313.633
2016-08-27 01:16 CT73 -11527.47602 -650210.723
2016-08-27 01:16 CT40 -11596.90867 -648260.214
2016-08-27 01:17 CT26 -9493.153286 -652313.633
2016-08-27 01:17 CT80 -11363.34558 -650385.959
2016-08-27 01:17 CT72 -11527.47355 -650213.8

表1

Truck   LoadLocation    Tonnes  ArriveTimestamp
CT70 338-001 261 2016-02-21 00:23
CT66 338-001 271 2016-02-21 00:31
CT62 338-001 264 2016-02-21 00:45
CT73 338-001 254 2016-02-21 00:54
CT71 338-001 250 2016-02-21 01:04
CT39 338-001 182.172 2016-02-21 01:11
CT62 338-001 285 2016-02-21 01:19
CT70 338-001 282 2016-02-21 01:25
CT73 338-001 250 2016-02-21 01:30
CT73 338-001 275 2016-02-21 01:35
CT64 338-001 253 2016-02-21 01:42

表1和表2需要连接,其中Timestamp和ArriveTimeStamp相差一分钟以内,并且卡车ID相同。首选左连接,如果没有匹配,表 2 中的记录将被丢弃

最佳答案

您可以使用merge :

df = pd.merge(df1, df2, on='Truck', how='left')
print (df)
Truck LoadLocation Tonnes ArriveTimestamp Timestamp \
0 CT70 338-001 261.000 2016-02-21 00:23:00 NaT
1 CT66 338-001 271.000 2016-02-21 00:31:00 NaT
2 CT62 338-001 264.000 2016-02-21 00:45:00 NaT
3 CT73 338-001 254.000 2016-02-21 00:54:00 2016-08-27 01:16:00
4 CT71 338-001 250.000 2016-02-21 01:04:00 NaT
5 CT39 338-001 182.172 2016-02-21 01:11:00 NaT
6 CT62 338-001 285.000 2016-02-21 01:19:00 NaT
7 CT70 338-001 282.000 2016-02-21 01:25:00 NaT
8 CT73 338-001 250.000 2016-02-21 01:30:00 2016-08-27 01:16:00
9 CT73 338-001 275.000 2016-02-21 01:35:00 2016-08-27 01:16:00
10 CT64 338-001 253.000 2016-02-21 01:42:00 NaT

MineX MineY
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 -11527.47602 -650210.723
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 -11527.47602 -650210.723
9 -11527.47602 -650210.723
10 NaN NaN

boolean indexing ,其中过滤 datetimes 中的绝对差异 - 示例返回空 DataFrame:

print ((df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]'))
0 NaN
1 NaN
2 NaN
3 16244520.0
4 NaN
5 NaN
6 NaN
7 NaN
8 16242360.0
9 16242060.0
10 NaN
dtype: float64

print ((df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]').abs() < 60)
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
dtype: bool
Empty DataFrame

print (df[(df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]').abs() < 60])
Empty DataFrame
Columns: [Truck, LoadLocation, Tonnes, ArriveTimestamp, Timestamp, MineX, MineY]
Index: []

关于python - Pandas 根据时间条件连接两个表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39721008/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com