gpt4 book ai didi

python - 从 Datetime 操作创建 TimeDeltas 的 pandas 错误

转载 作者:太空宇宙 更新时间:2023-11-03 11:49:12 25 4
gpt4 key购买 nike

我看过其他几个相关问题here , here , 和 here ,他们都没有遇到和我完全一样的问题。

我使用的是 Pandas 版本 0.16.2。我在 dtype datetime64[ns] 的 Pandas 数据框中有几列:

In [6]: date_list = ["SubmittedDate","PolicyStartDate", "PaidUpDate", "MaturityDate", "DraftDate", "CurrentValuationDate", "DOB", "InForceDate"]

In [11]: data[date_list].head()

Out[11]:
SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate \
0 NaT 2002-11-18 NaT 2041-03-04 NaT
1 NaT 2015-01-13 NaT NaT NaT
2 NaT 2014-10-15 NaT NaT NaT
3 NaT 2009-08-27 NaT NaT NaT
4 NaT 2007-04-19 NaT 2013-10-01 NaT

CurrentValuationDate DOB InForceDate
0 2015-04-30 1976-03-04 2002-11-18
1 NaT 1949-09-27 2015-01-13
2 NaT 1947-06-15 2014-10-15
3 2015-07-30 1960-06-07 2009-08-27
4 2010-04-21 1950-10-01 2007-04-19

这些最初是字符串格式(例如“1976-03-04”),我使用以下方法将其转换为日期时间对象:

In [7]: for datecol in date_list:
...: data[datecol] = pd.to_datetime(data[datecol], coerce=True, errors = 'raise')

以下是每一列的数据类型:

In [8]: for datecol in date_list:
print data[datecol].dtypes

返回:

datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]

到目前为止,还不错。但我想要做的是为这些列中的每一个创建一个新列,以给出从某个日期开始的年龄(以天为单位)。

In [13]: current_date = pd.to_datetime("2015-07-31")

我首先运行了这个:

In [14]: for i in date_list:
....: data[i+"InDays"] = data[i].apply(lambda x: current_date - x)

但是,当我检查返回列的数据类型时:

In [15]: for datecol in date_list:
....: print data[datecol + "InDays"].dtypes

我得到这些:

object
timedelta64[ns]
object
timedelta64[ns]
object
timedelta64[ns]
timedelta64[ns]
timedelta64[ns]

我不知道为什么其中三个是对象,而它们应该是时间增量。接下来我要做的是:

In [16]: for i in date_list:
....: data[i+"InDays"] = data[i+"InDays"].dt.days

此方法适用于 timedelta 列。但是,由于其中三列不是时间增量,因此出现此错误:

AttributeError: Can only use .dt accessor with datetimelike values

我怀疑这三列中的某些值阻止 Pandas 将它们转换为时间增量。我不知道如何算出这些值可能是什么。

最佳答案

出现此问题是因为您的三列仅包含 NaT 值,这会导致在您对其应用条件时将这些列视为对象。

你应该在你的 apply 部分设置一些条件,在 NaT 的情况下默认为某个时间增量。示例 -

for i in date_list:
data[i+"InDays"] = data[i].apply(lambda x: current_date - x if x is not pd.NaT else pd.Timedelta(0))

或者如果你不能做到以上几点,你应该在你想做的地方加上一个条件 - data[i+"InDays"] = data[i+"InDays"].dt.days ,到仅当系列的 dtype 允许时才使用它。

或者一种更简单的方法来更改 apply 部分以直接获得您想要的内容 -

for i in date_list:
data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else x)

这将输出 -

In [110]: data
Out[110]:
SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate \
0 NaT 2002-11-18 NaT 2041-03-04 NaT
1 NaT 2015-01-13 NaT NaT NaT
2 NaT 2014-10-15 NaT NaT NaT
3 NaT 2009-08-27 NaT NaT NaT
4 NaT 2007-04-19 NaT 2013-10-01 NaT

CurrentValuationDate DOB InForceDate SubmittedDateInDays \
0 2015-04-30 1976-03-04 2002-11-18 NaT
1 NaT 1949-09-27 2015-01-13 NaT
2 NaT 1947-06-15 2014-10-15 NaT
3 2015-07-30 1960-06-07 2009-08-27 NaT
4 2010-04-21 1950-10-01 2007-04-19 NaT

PolicyStartDateInDays PaidUpDateInDays MaturityDateInDays DraftDateInDays \
0 4638 NaT -9348 NaT
1 199 NaT NaN NaT
2 289 NaT NaN NaT
3 2164 NaT NaN NaT
4 3025 NaT 668 NaT

CurrentValuationDateInDays DOBInDays InForceDateInDays
0 92 14393 4638
1 NaN 24048 199
2 NaN 24883 289
3 1 20142 2164
4 1927 23679 3025

如果你想让你的 NaT 变成 NaN 你可以使用 -

for i in date_list:
data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)

示例/演示 -

In [114]: for i in date_list:
.....: data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)
.....:

In [115]: data
Out[115]:
SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate \
0 NaT 2002-11-18 NaT 2041-03-04 NaT
1 NaT 2015-01-13 NaT NaT NaT
2 NaT 2014-10-15 NaT NaT NaT
3 NaT 2009-08-27 NaT NaT NaT
4 NaT 2007-04-19 NaT 2013-10-01 NaT

CurrentValuationDate DOB InForceDate SubmittedDateInDays \
0 2015-04-30 1976-03-04 2002-11-18 NaN
1 NaT 1949-09-27 2015-01-13 NaN
2 NaT 1947-06-15 2014-10-15 NaN
3 2015-07-30 1960-06-07 2009-08-27 NaN
4 2010-04-21 1950-10-01 2007-04-19 NaN

PolicyStartDateInDays PaidUpDateInDays MaturityDateInDays \
0 4638 NaN -9348
1 199 NaN NaN
2 289 NaN NaN
3 2164 NaN NaN
4 3025 NaN 668

DraftDateInDays CurrentValuationDateInDays DOBInDays InForceDateInDays
0 NaN 92 14393 4638
1 NaN NaN 24048 199
2 NaN NaN 24883 289
3 NaN 1 20142 2164
4 NaN 1927 23679 3025

关于python - 从 Datetime 操作创建 TimeDeltas 的 pandas 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32137330/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com