gpt4 book ai didi

python - Pandas Dataframe - 对于每一行,返回具有重叠日期的其他行的计数

转载 作者:太空宇宙 更新时间:2023-11-03 13:54:41 25 4
gpt4 key购买 nike

我有一个包含项目、开始日期和结束日期的数据框。对于每一行,我想返回项目开始时正在进行的其他项目的数量。使用 df.apply() 时如何嵌套循环?我试过使用 for 循环,但我的数据框很大,而且花费的时间太长。

import datetime as dt

data = {'project' :['A', 'B', 'C'],
'pr_start_date':[dt.datetime(2018, 9, 1), dt.datetime(2019, 4, 1), dt.datetime(2019, 6, 8)],
'pr_end_date': [dt.datetime(2019, 6, 15), dt.datetime(2019, 12, 1), dt.datetime(2019, 8, 1)]}

df = pd.DataFrame(data)

def cons_overlap(start):
overlaps = 0
for i in df.index:
other_start = df.loc[i, 'pr_start_date']
other_end = df.loc[i, 'pr_end_date']
if (start > other_start) & (start < other_end):
overlaps += 1

return overlaps

df['overlap'] = df.apply(lambda row: cons_overlap(row['pr_start_date']), axis=1)

这是我正在寻找的输出:

    pr  pr_start_date pr_end_date   overlap
0 A 2018-09-01 2019-06-15 0
1 B 2019-04-01 2019-12-01 1
2 C 2019-06-08 2019-08-01 2

最佳答案

我建议你利用numpy broadcasting :

ends = df.pr_start_date.values < df.pr_end_date.values[:, None]
starts = df.pr_start_date.values > df.pr_start_date.values[:, None]
df['overlap'] = (ends & starts).sum(0)
print(df)

输出

  project pr_start_date pr_end_date  overlap
0 A 2018-09-01 2019-06-15 0
1 B 2019-04-01 2019-12-01 1
2 C 2019-06-08 2019-08-01 2

两端和起点都是3x3的矩阵,满足条件时为真:

# ends   
[[ True True True]
[ True True True]
[ True True True]]

# starts
[[False True True]
[False False True]
[False False False]]

然后找到与逻辑&的交集并跨列求和(sum(0))。

关于python - Pandas Dataframe - 对于每一行,返回具有重叠日期的其他行的计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58293218/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com