gpt4 book ai didi

python - 使用 2 个数据帧的 IP 列和日期范围用 df2 中的数据填充 df1 数据帧

转载 作者:行者123 更新时间:2023-12-01 07:28:25 24 4
gpt4 key购买 nike

我正在使用 2 个数据框。第一个信息不完整。第二个数据帧包含第一次看到和最后一次看到的时间范围的信息。我正在尝试使用 df2 中的源地址和时间范围来填写源主机名和源用户名,其中 df1 中的日期时间属于该时间范围。

df1
sourceaddress sourcehostname sourceusername endtime datetime
0 10.0.0.59 computer1 NaN 1564666638000 2019-08-01 09:37:18
1 10.0.0.59 NaN NaN 1564666640000 2019-08-01 09:37:20
2 10.0.0.59 NaN NaN 1564666642000 2019-08-01 09:37:22
3 10.0.0.59 NaN NaN 1564666643000 2019-08-01 09:37:23
4 10.0.0.59 NaN NaN 1564666643000 2019-08-01 09:37:23
5 10.0.0.59 NaN NaN 1564666645000 2019-08-01 09:37:25
6 10.0.0.59 computer1 NaN 1564666646000 2019-08-01 09:37:26
7 10.0.0.59 NaN NaN 1564666646000 2019-08-01 09:37:26
8 10.0.0.59 computer1 NaN 1564666649000 2019-08-01 09:37:29
9 10.0.0.59 computer1 NaN 1564666650000 2019-08-01 09:37:30
10 10.0.0.59 NaN NaN 1564666850000 2019-08-01 09:40:50
...
43196 10.0.0.187 computer2 NaN 1564718395000 2019-08-01 23:59:55
43197 10.0.0.187 computer2 user1 1564718397000 2019-08-01 23:59:57
43198 10.0.0.187 computer2 NaN 1564718397000 2019-08-01 23:59:57
43199 10.0.0.187 computer2 user1 1564718398000 2019-08-01 23:59:58
43200 10.0.0.187 NaN NaN 1564718398000 2019-08-01 23:59:58
43201 10.0.0.187 computer2 user1 1564718398000 2019-08-01 23:59:58

df2
sourceaddress sourcehostname sourceusername firstseen lastseen
0 10.0.0.59 computer1 user1 2019-08-01 09:37:59 2019-08-01 09:46:08
1 10.0.0.187 computer2 user1 2019-08-01 00:00:03 2019-08-01 23:59:58

期望的结果:

df3
sourceaddress sourcehostname sourceusername endtime datetime
0 10.0.0.59 computer1 NaN 1564666638000 2019-08-01 09:37:18
1 10.0.0.59 NaN NaN 1564666640000 2019-08-01 09:37:20
2 10.0.0.59 NaN NaN 1564666642000 2019-08-01 09:37:22
3 10.0.0.59 NaN NaN 1564666643000 2019-08-01 09:37:23
4 10.0.0.59 NaN NaN 1564666643000 2019-08-01 09:37:23
5 10.0.0.59 NaN NaN 1564666645000 2019-08-01 09:37:25
6 10.0.0.59 computer1 NaN 1564666646000 2019-08-01 09:37:26
7 10.0.0.59 NaN NaN 1564666646000 2019-08-01 09:37:26
8 10.0.0.59 computer1 NaN 1564666649000 2019-08-01 09:37:29
9 10.0.0.59 computer1 NaN 1564666650000 2019-08-01 09:37:30
10 10.0.0.59 computer1 user1 1564668650000 2019-08-01 10:10:50
...
43196 10.0.0.187 computer2 user1 1564718395000 2019-08-01 23:59:55
43197 10.0.0.187 computer2 user1 1564718397000 2019-08-01 23:59:57
43198 10.0.0.187 computer2 user1 1564718397000 2019-08-01 23:59:57
43199 10.0.0.187 computer2 user1 1564718398000 2019-08-01 23:59:58
43200 10.0.0.187 computer2 user1 1564718398000 2019-08-01 23:59:58
43201 10.0.0.187 computer2 user1 1564718398000 2019-08-01 23:59:58

**按照下面的示例:

df3[-5:]
sourceaddress sourcehostname sourceusername endtime datetime firstseen lastseen
43197 10.99.0.187 computer2 user1 1564718397000 2019-08-01 23:59:57 2019-08-01 00:00:03 2019-08-01 23:59:58
43198 10.99.0.187 computer2 NaN 1564718397000 2019-08-01 23:59:57 2019-08-01 00:00:03 2019-08-01 23:59:58
43199 10.99.0.187 computer2 NaN 1564718398000 2019-08-01 23:59:58 2019-08-01 00:00:03 2019-08-01 23:59:58
43200 10.99.0.187 computer2 user1 1564718398000 2019-08-01 23:59:58 2019-08-01 00:00:03 2019-08-01 23:59:58
43201 10.99.0.187 computer2 user1 1564718398000 2019-08-01 23:59:58 2019-08-01 00:00:03 2019-08-01 23:59:58

最佳答案

看起来像是一个合并问题:

df3 = df1.merge(df2,
on='sourceaddress', how='left',
suffixes=['','_df2']
)
# mark the valid time:
mask = df3['datetime'].ge(df3['firstseen']) & df3['datetime'].lt(df3['lastseen'])

# update the info
df3.loc[mask, 'sourcehostname'] = df3.loc[mask, 'sourcehostname_df2']
df3.loc[mask, 'sourceusername'] = df3.loc[mask, 'sourceusername_df2']

然后您可以删除 sourcehostname_df2sourceusername_df2

关于python - 使用 2 个数据帧的 IP 列和日期范围用 df2 中的数据填充 df1 数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57331118/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com