gpt4 book ai didi

python - 如何使用 python 根据日期接近程度的特定条件删除重复项?

转载 作者:行者123 更新时间:2023-12-01 07:28:35 25 4
gpt4 key购买 nike

我有以下数据框:

> df = pd.DataFrame( columns = ['Name','Change Date','Final Date']) 
> df['Name'] = ['Alexandra','Alexandra','Alexandra','Alexandra','Bobby','Bobby']
> df['Change Date'] =['2019-04-12','2019-04-28','2019-05-21','2019-05-30','2019-03-11','2019-03-27']
> df['Final Date'] =['2019-04-15','2019-04-15','2019-05-27','2019-05-27','2019-03-20','2019-03-20']

我想删除所有重复项,但只保留更改日期最接近每个最终日期的行,以便提供以下数据框:

> df = pd.DataFrame( columns = ['Name','Change Date','Final Date']) 
> df['Name'] = ['Alexandra','Alexandra','Bobby']
> df['Change Date'] =['2019-04-12','2019-05-30','2019-03-27']
> df['Final Date'] =['2019-04-15','2019-05-27','2019-03-20']

最佳答案

将两列都转换为日期时间,减去 Series.sub并通过 Series.abs 获取绝对值。最后使用 DataFrameGroupBy.idxmin 获取每组最小值的索引并通过 DataFrame.loc 选择原始行:

df['Final Date'] = pd.to_datetime(df['Final Date'])
df['Change Date'] = pd.to_datetime(df['Change Date'])
df['diff'] = df['Final Date'].sub(df['Change Date']).abs()

df1 = df.loc[df.groupby(['Name','Final Date'])['diff'].idxmin()]
print (df1)
Name Change Date Final Date diff
0 Alexandra 2019-04-12 2019-04-15 3 days
3 Alexandra 2019-05-30 2019-05-27 3 days
5 Bobby 2019-03-27 2019-03-20 7 days

如果可能,每个组使用重复的最小值:

df1 = df[df.groupby(['Name','Final Date'])['diff'].transform('min').eq(df['diff'])]

或者,如果需要仅按 Name 列进行分组,并选择两个最小 3 天 值,则使用 GroupBy.transform 创建系列和 min 并按 diff 进行比较,最后按 boolean indexing 进行过滤:

df1 = df[df.groupby('Name')['diff'].transform('min').eq(df['diff'])]
print (df1)
Name Change Date Final Date diff
0 Alexandra 2019-04-12 2019-04-15 3 days
3 Alexandra 2019-05-30 2019-05-27 3 days
5 Bobby 2019-03-27 2019-03-20 7 days

关于python - 如何使用 python 根据日期接近程度的特定条件删除重复项?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57321272/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com