gpt4 book ai didi

python - Pandas 如何截断分钟,秒 pandas.tslib.Timestamp

转载 作者:太空宇宙 更新时间:2023-11-04 08:48:05 25 4
gpt4 key购买 nike

我使用 Cloudera VM 5.2 和 pandas 0.18.0。

我有以下数据

adclicksDF = pd.read_csv('/home/cloudera/Eglence/ad-clicks.csv',
parse_dates=['timestamp'],
skipinitialspace=True).assign(adCount=1)

adclicksDF.head(n=5)
Out[107]:
timestamp txId userSessionId teamId userId adId adCategory \
0 2016-05-26 15:13:22 5974 5809 27 611 2 electronics
1 2016-05-26 15:17:24 5976 5705 18 1874 21 movies
2 2016-05-26 15:22:52 5978 5791 53 2139 25 computers
3 2016-05-26 15:22:57 5973 5756 63 212 10 fashion
4 2016-05-26 15:22:58 5980 5920 9 1027 20 clothing



adCount
0 1
1 1
2 1
3 1
4 1

数据类型字段是

for col in adclicksDF:
print(col)
print(type(adclicksDF[col][1]))


timestamp
<class 'pandas.tslib.Timestamp'>
txId
<class 'numpy.int64'>
userSessionId
<class 'numpy.int64'>
teamId
<class 'numpy.int64'>
userId
<class 'numpy.int64'>
adId
<class 'numpy.int64'>
adCategory
<class 'str'>
adCount
<class 'numpy.int64'>

我想截断时间戳中的分钟和秒。

我试过了

adclicksDF["timestamp"] = pd.to_datetime(adclicksDF["timestamp"],format='%Y-%m-%d %H')

adclicksDF.head(n=5)
Out[110]:
timestamp txId userSessionId teamId userId adId adCategory \
0 2016-05-26 15:13:22 5974 5809 27 611 2 electronics
1 2016-05-26 15:17:24 5976 5705 18 1874 21 movies
2 2016-05-26 15:22:52 5978 5791 53 2139 25 computers
3 2016-05-26 15:22:57 5973 5756 63 212 10 fashion
4 2016-05-26 15:22:58 5980 5920 9 1027 20 clothing

adCount
0 1
1 1
2 1
3 1
4 1

这不会截断分钟和秒。

如何截断分钟和秒?

最佳答案

您可以使用:

adclicksDF["timestamp"] = pd.to_datetime(adclicksDF["timestamp"])
.apply(lambda x: x.replace(minute=0, second=0))


print (adclicksDF)
timestamp txId userSessionId teamId userId adId adCategory
0 2016-05-26 15:00:00 5974 5809 27 611 2 electronics
1 2016-05-26 15:00:00 5976 5705 18 1874 21 movies
2 2016-05-26 15:00:00 5978 5791 53 2139 25 computers
3 2016-05-26 15:00:00 5973 5756 63 212 10 fashion
4 2016-05-26 15:00:00 5980 5920 9 1027 20 clothing

print (type(adclicksDF.ix[0, 'timestamp']))
<class 'pandas.tslib.Timestamp'>

如果需要输出为 string 使用 dt.strftime :

adclicksDF["timestamp"] = pd.to_datetime(adclicksDF["timestamp"]).dt.strftime('%Y-%m-%d %H')
print (adclicksDF)
timestamp txId userSessionId teamId userId adId adCategory
0 2016-05-26 15 5974 5809 27 611 2 electronics
1 2016-05-26 15 5976 5705 18 1874 21 movies
2 2016-05-26 15 5978 5791 53 2139 25 computers
3 2016-05-26 15 5973 5756 63 212 10 fashion
4 2016-05-26 15 5980 5920 9 1027 20 clothing

print (type(adclicksDF.ix[0, 'timestamp']))
<class 'str'>

编辑:

更好的解决方案是使用 dt.floor 就像 Alex 的回答一样

关于python - Pandas 如何截断分钟,秒 pandas.tslib.Timestamp,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38271224/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com