gpt4 book ai didi

python - Pandas - 填充时间序列数据中缺失的时间

转载 作者:行者123 更新时间:2023-12-01 06:50:42 30 4
gpt4 key购买 nike

有一个像这样的 pandas 数据框:

    date_time    var1     var2    var3   var4    var6
20080322 0000 0 0 0 0 -11
20080322 0001 0 5 0 0 9
20080322 0003 5 0 0 0 0
20080322 0004 0 0 11 0 -9
20080322 0005 0 12 0 0 1
20080322 0009 7 0 0 4 5
20080322 0010 0 0 0 0 27

数据丢失了一些分钟(0002、0006、0007、0008)。我正在寻找一种将丢失的行插入数据框中的好方法。到目前为止我尝试过的:

import pandas as pd
widths = [13,8,9,8,7,8]
df = pd.read_fwf("data", widths=widths)

df['date_time'] = pd.to_datetime(df['date_time'] , format='%Y%m%d %H%M')
df = df.set_index('date_time').reindex(pd.date_range("20080322 0000", "20080322 0010", freq='1min').strftime('%Y%m%d %H%M'), fill_value="NaN")
print (df)

出现缺失的行,但所有值均为 NaN。有什么想法吗?

最佳答案

一种可能的解决方案是删除转换为日期时间并按字符串重新索引(由 DatetimeIndex.strftime 创建):

df = pd.read_fwf("data", widths=widths)

df = (df.set_index('date_time')
.reindex(pd.date_range("20080322 0000", "20080322 0010", freq='1min')
.strftime('%Y%m%d %H%M')))
print (df)
var1 var2 var3 var4 var6
20080322 0000 0.0 0.0 0.0 0.0 -11.0
20080322 0001 0.0 5.0 0.0 0.0 9.0
20080322 0002 NaN NaN NaN NaN NaN
20080322 0003 5.0 0.0 0.0 0.0 0.0
20080322 0004 0.0 0.0 11.0 0.0 -9.0
20080322 0005 0.0 12.0 0.0 0.0 1.0
20080322 0006 NaN NaN NaN NaN NaN
20080322 0007 NaN NaN NaN NaN NaN
20080322 0008 NaN NaN NaN NaN NaN
20080322 0009 7.0 0.0 0.0 4.0 5.0
20080322 0010 0.0 0.0 0.0 0.0 27.0
<小时/>

另一个解决方案是删除 strftime 将日期时间转换为字符串,以便按日期时间重新索引:

df = pd.read_fwf("data", widths=widths)

df['date_time'] = pd.to_datetime(df['date_time'] , format='%Y%m%d %H%M')
df = (df.set_index('date_time')
.reindex(pd.date_range("20080322 0000", "20080322 0010", freq='1min')))

或者使用DataFrame.asfreq - 使用DatetimeIndex:

df = pd.read_fwf("data", widths=widths)

df['date_time'] = pd.to_datetime(df['date_time'] , format='%Y%m%d %H%M')
df = df.set_index('date_time').asfreq('1 Min')
<小时/>
print (df)
var1 var2 var3 var4 var6
2008-03-22 00:00:00 0.0 0.0 0.0 0.0 -11.0
2008-03-22 00:01:00 0.0 5.0 0.0 0.0 9.0
2008-03-22 00:02:00 NaN NaN NaN NaN NaN
2008-03-22 00:03:00 5.0 0.0 0.0 0.0 0.0
2008-03-22 00:04:00 0.0 0.0 11.0 0.0 -9.0
2008-03-22 00:05:00 0.0 12.0 0.0 0.0 1.0
2008-03-22 00:06:00 NaN NaN NaN NaN NaN
2008-03-22 00:07:00 NaN NaN NaN NaN NaN
2008-03-22 00:08:00 NaN NaN NaN NaN NaN
2008-03-22 00:09:00 7.0 0.0 0.0 4.0 5.0
2008-03-22 00:10:00 0.0 0.0 0.0 0.0 27.0

如果需要,最后添加index的原始格式DatetimeIndex.strftime :

df.index = df.index.strftime('%Y%m%d %H%M')
print (df)
var1 var2 var3 var4 var6
20080322 0000 0.0 0.0 0.0 0.0 -11.0
20080322 0001 0.0 5.0 0.0 0.0 9.0
20080322 0002 NaN NaN NaN NaN NaN
20080322 0003 5.0 0.0 0.0 0.0 0.0
20080322 0004 0.0 0.0 11.0 0.0 -9.0
20080322 0005 0.0 12.0 0.0 0.0 1.0
20080322 0006 NaN NaN NaN NaN NaN
20080322 0007 NaN NaN NaN NaN NaN
20080322 0008 NaN NaN NaN NaN NaN
20080322 0009 7.0 0.0 0.0 4.0 5.0
20080322 0010 0.0 0.0 0.0 0.0 27.0

关于python - Pandas - 填充时间序列数据中缺失的时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59018422/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com