gpt4 book ai didi

python - 检查日期列表是否在日期范围列表之间

转载 作者:行者123 更新时间:2023-12-04 15:03:49 27 4
gpt4 key购买 nike

我有两个从大型酒店数据库中提取的数据框:

  1. 客户购物历史数据框 (df_hist)
    customer_id   item   date     
1234 milk 2012-04-20
1234 sugar 2012-05-01
5678 salt 2017-07-15
5678 water 2017-08-10
  1. 客户访问历史数据框 (df_visit)
    customer_id   start          end         visit
1234 2012-04-06 2012-04-25 1
5678 2017-07-10 2017-07-20 5
5678 2017-08-05 2017-08-11 6

我想找出购买历史中每件商品的访问次数

  1. 结果(df_result):
    customer_id   item   date         visit
1234 milk 2012-04-20 1
1234 sugar 2012-05-01 null
5678 salt 2017-07-15 5
5678 water 2017-08-10 6

我尝试使用多个 for 循环,但考虑到 df_visit 有将近 600 万行对应于大约 15,000 个唯一客户,它无法扩展。解决此问题的更有效方法是什么?

最佳答案

这是一种方法:

import io
d1 = io.StringIO("""
customer_id item date
1234 milk 2012-04-20
1234 sugar 2012-05-01
5678 salt 2017-07-15
5678 water 2017-08-10
""")

d2 = io.StringIO("""
customer_id start end visit
1234 2012-04-06 2012-04-25 1
5678 2017-07-10 2017-07-20 5
5678 2017-08-05 2017-08-11 6
""")

import pandas as pd

df1 = pd.read_csv(d1, sep='\s+', parse_dates=['date'])
df2 = pd.read_csv(d2, sep='\s+', parse_dates=['start', 'end'])

merged = pd.merge_asof(df1, df2, left_on=['date'], right_on=['start'], by='customer_id', direction='backward')

mask_dates = (merged['end'] >= merged['date']) & (merged['date']>=merged['start'])

merged['visit'] = merged.loc[mask_dates, 'visit']

merged

关于python - 检查日期列表是否在日期范围列表之间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66505578/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com