gpt4 book ai didi

python - 计算 pandas groupby 相同 2 个日期的对象中 2 个日期的差异

转载 作者:太空宇宙 更新时间:2023-11-03 15:09:57 25 4
gpt4 key购买 nike

我正在尝试创建一个新的 pandas.DataFrame 列,其中包含两个日期列之间的工作日数。我无法将日期列中的日期作为函数调用中的参数引用(我收到 TypeError:无法转换输入错误)。不过,我可以将系列中的值压缩到一个列表中,并使用 For 循环来引用参数。理想情况下,我更愿意从两个日期列创建一个 GroupBy 对象并计算差异。

创建数据框:

import pandas as pd

df = pd.DataFrame.from_dict({'Date1': ['2017-05-30 16:00:00',
'2017-05-30 16:00:00',
'2017-05-30 16:00:00'],
'Date2': ['2017-06-16 16:00:00',
'2017-07-21 16:00:00',
'2017-08-18 16:00:00'],
'Value1': [2.97, 3.3, 4.03],
'Value2': [96L, 14L, 2L]})

df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])

df.dtypes

验证数据帧:

Date1     datetime64[ns]
Date2 datetime64[ns]
Value1 float64
Value2 int64
dtype: object

定义函数:

def date_diff(startDate, endDate):
return float(len(pd.bdate_range(startDate, endDate)) - 1)

尝试从 date_diff 函数调用的结果中提取列:

df['DateDiff'] = date_diff(df['Date1'], df['Date2'])

类型错误:

TypeError: Cannot convert input [0   2017-05-30 16:00:00
1 2017-05-30 16:00:00
2 2017-05-30 16:00:00
Name: Date1, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp

引用包含日期的元组列表的“For 循环”有效:

date_List = list(zip(df['Date1'], df['Date2']))

for i in range(len(date_List)):
df.loc[(df['Date1'] == date_List[i][0]) & (df['Date2'] == date_List[i][1]), 'diff'] = date_diff(date_List[i][0], date_List[i][1])

Date1 Date2 Value1 Value2 diff
0 2017-05-30 16:00:00 2017-06-16 16:00:00 2.97 96 13.0
1 2017-05-30 16:00:00 2017-07-21 16:00:00 3.30 14 38.0
2 2017-05-30 16:00:00 2017-08-18 16:00:00 4.03 2 58.0

理想情况下,我想使用 GroupBy 对象(按 Date1 和 Date2):

grp = df.groupby(['Date1', 'Date2'])

所需输出:

[((Timestamp('2017-05-30 16:00:00'), Timestamp('2017-06-16 16:00:00')),
Date1 Date2 Value1 Value2 diff
0 2017-05-30 16:00:00 2017-06-16 16:00:00 2.97 96 13.0),
((Timestamp('2017-05-30 16:00:00'), Timestamp('2017-07-21 16:00:00')),
Date1 Date2 Value1 Value2 diff
1 2017-05-30 16:00:00 2017-07-21 16:00:00 3.3 14 38.0),
((Timestamp('2017-05-30 16:00:00'), Timestamp('2017-08-18 16:00:00')),
Date1 Date2 Value1 Value2 diff
2 2017-05-30 16:00:00 2017-08-18 16:00:00 4.03 2 58.0)]

最佳答案

您需要将类型转换为 datetime64[D] 以使 numpy 满意,例如:

代码:

import numpy as np

def date_diff(start_dates, end_dates):
return np.busday_count(
start_dates.values.astype('datetime64[D]'),
end_dates.values.astype('datetime64[D]'))

测试代码:

import pandas as pd
df = pd.DataFrame.from_dict({'Date1': ['2017-05-30 16:00:00',
'2017-05-30 16:00:00',
'2017-05-30 16:00:00'],
'Date2': ['2017-06-16 16:00:00',
'2017-07-21 16:00:00',
'2017-08-18 16:00:00'],
'Value1': [2.97, 3.3, 4.03],
'Value2': [96L, 14L, 2L]})

df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])

df['DateDiff'] = date_diff(df['Date1'], df['Date2'])
print(df)

结果:

                Date1               Date2  Value1  Value2  DateDiff
0 2017-05-30 16:00:00 2017-06-16 16:00:00 2.97 96 13
1 2017-05-30 16:00:00 2017-07-21 16:00:00 3.30 14 38
2 2017-05-30 16:00:00 2017-08-18 16:00:00 4.03 2 58

关于python - 计算 pandas groupby 相同 2 个日期的对象中 2 个日期的差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44294755/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com