gpt4 book ai didi

python - date_parser 和 read_csv 的函数不起作用

转载 作者:行者123 更新时间:2023-12-01 01:37:58 27 4
gpt4 key购买 nike

我正在使用 pd.read_csv 读取 3 个不同的数据集。数据的一列是以秒为单位的时间,我想使用我为 pd.read_csv date_parser 参数创建的函数。当所有数据都是整数时它工作得很好。但是,当我有字符串或 float 时,我创建的函数不起作用。我认为问题出现在我的函数的 datetime.datetime.fromtimestamp(float(time_in_secs) 部分。有谁知道如何让它适用于我的所有数据集。我完全陷入困境。我在下面放了一个示例3 个不同的数据集看起来。

数据集1

555, 1404803485, 800

555, 1408906759, 900

数据集2

231, 1404803485, pass

231, 1404803490, fail

数据集3

16010925, 1403890894, 40.5819880696

16010925, 1903929273, 40.5819880696

def dateparse(time_in_secs):

if isinstance(time_in_secs, str):
if time_in_secs == '\\N':
time_in_secs = 0

tm = datetime.datetime.fromtimestamp(float(time_in_secs))
tm = tm - datetime.timedelta(
minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
return tm


pd.read_csv('dataset_here.csv',
delimiter=',', index_col=[0,1], parse_dates=['Timestamp'],
date_parser=dateparse, names=['Serial', 'Timestamp', 'result'])

最佳答案

我相信需要将所有字符串的时间转换为0,以便 float 让您的解决方案正常工作:

def dateparse(time_in_secs):

if isinstance(time_in_secs, str):
#https://stackoverflow.com/a/45372194
#time_in_secs = 86400
time_in_secs = 0

#print (time_in_secs)
tm = datetime.datetime.fromtimestamp(float(time_in_secs))
tm = tm - datetime.timedelta(
minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
return tm

更通用的解决方案 - 尝试将值转换为 float ,如果不可能则分配默认值:

def dateparse(time_in_secs):

if isinstance(time_in_secs, str):
try:
time_in_secs = float(time_in_secs)
except ValueError:
#https://stackoverflow.com/a/45372194
#time_in_secs = 86400
time_in_secs = 0

#print (time_in_secs)
tm = datetime.datetime.fromtimestamp(float(time_in_secs))
tm = tm - datetime.timedelta(
minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
return tm

示例:在windows下测试:

import pandas as pd
import datetime

def dateparse(time_in_secs):

if isinstance(time_in_secs, str):
try:
time_in_secs = float(time_in_secs)
except ValueError:
#https://stackoverflow.com/a/45372194
#time_in_secs = 0
time_in_secs = 86400

print (time_in_secs)
tm = datetime.datetime.fromtimestamp(float(time_in_secs))
tm = tm - datetime.timedelta(
minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
return tm

temp=u"""16010925,test,40.5819880696
16010925,1903929273,40.5819880696"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), index_col=[0,1], parse_dates=['Timestamp'],
date_parser=dateparse, names=['Serial', 'Timestamp', 'result'])

print (df)
result
Serial Timestamp
16010925 1970-01-02 01:00:00 40.581988
2030-05-02 07:10:00 40.581988

print (df.index.get_level_values(1))
DatetimeIndex(['1970-01-02 01:00:00', '2030-05-02 07:10:00'],
dtype='datetime64[ns]', name='Timestamp', freq=None)

关于python - date_parser 和 read_csv 的函数不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52203246/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com