gpt4 book ai didi

python - 填写数据框中缺失的日期

转载 作者:行者123 更新时间:2023-12-01 06:37:21 25 4
gpt4 key购买 nike

我有一个数据框,如下所示。

dataframe

它有 8 列和 n 行。第一列是缺少天数的日期。 (如 1946-01-04 等...)但也有重复项(如 1946-01-02)我想要一个代码来保留此重复项,但也填充缺失的日期并添加 NaN到行中的其他单元格。

我试过了

dfx = pd.DataFrame(None, index=pd.DatetimeIndex(start=df.地震の発生日時.min(), end=df.地震の発生日時.max(), freq='D'))
df = df.apply(pd.concat([df, dfx], join='outer', axis=1))

但它只是在文件末尾从 .min() 添加到 .max() ...我想将其应用到数据中,例如

Date        Time        Places  w     x      y    z
1946-01-02 14:45:00 6.8 36.3 140.1 31 3.2 1
1946-01-02 22:18:00 7.6 40.5 141.4 0 4.6 3
1946-01-02 23:29:00 6.7 36.1 139.4 39 4.3 2
1946-01-03 04:28:00 5.6 34.4 136.5 1 4.2 2
1946-01-03 04:36:00 6.5 35.5 139.5 50 3 1
1946-01-04 00:00:00 NaN NaN NaN NaN NaN NaN
1946-01-06 10:56:00 8.1 41.5 143.4 51 5.2 3

顺便说一句。我无法使用内部联接。它抛出:AttributeError:“Places”不是“Series”对象的有效函数

最佳答案

如果第一列填充 DatetimeIndex 且没有时间的解决方案:

print (df)
Time Places w x y z col
Date
1946-01-02 14:45:00 6.8 36.3 140.1 31 3.2 1
1946-01-02 22:18:00 7.6 40.5 141.4 0 4.6 3
1946-01-02 23:29:00 6.7 36.1 139.4 39 4.3 2
1946-01-03 04:28:00 5.6 34.4 136.5 1 4.2 2
1946-01-05 04:36:00 6.5 35.5 139.5 50 3.0 1

print (df.index)
DatetimeIndex(['1946-01-02', '1946-01-02', '1946-01-02', '1946-01-03',
'1946-01-05'],
dtype='datetime64[ns]', name='Date', freq=None)

使用date_range创建新的DataFrame :

dfx = pd.DataFrame(index=pd.date_range(start=df.index.min(), 
end=df.index.max(), freq='D'))

print (dfx)
Empty DataFrame
Columns: []
Index: [1946-01-02 00:00:00, 1946-01-03 00:00:00, 1946-01-04 00:00:00, 1946-01-05 00:00:00]

然后使用 DataFrame.join :

df = dfx.join(df)
print (df)
Time Places w x y z col
1946-01-02 14:45:00 6.8 36.3 140.1 31.0 3.2 1.0
1946-01-02 22:18:00 7.6 40.5 141.4 0.0 4.6 3.0
1946-01-02 23:29:00 6.7 36.1 139.4 39.0 4.3 2.0
1946-01-03 04:28:00 5.6 34.4 136.5 1.0 4.2 2.0
1946-01-04 NaN NaN NaN NaN NaN NaN NaN
1946-01-05 04:36:00 6.5 35.5 139.5 50.0 3.0 1.0
<小时/>

如果有 DatetimeIndex 与时间创建列 DataFrame.reset_index :

print (df)
Places w x y z col
DateTime
1946-01-02 14:45:00 6.8 36.3 140.1 31 3.2 1
1946-01-02 22:18:00 7.6 40.5 141.4 0 4.6 3
1946-01-02 23:29:00 6.7 36.1 139.4 39 4.3 2
1946-01-03 04:28:00 5.6 34.4 136.5 1 4.2 2
1946-01-05 04:36:00 6.5 35.5 139.5 50 3.0 1

print (df.index)
DatetimeIndex(['1946-01-02 14:45:00', '1946-01-02 22:18:00',
'1946-01-02 23:29:00', '1946-01-03 04:28:00',
'1946-01-05 04:36:00'],
dtype='datetime64[ns]', name='DateTime', freq=None)
<小时/>
df = df.reset_index()
print (df)
DateTime Places w x y z col
0 1946-01-02 14:45:00 6.8 36.3 140.1 31 3.2 1
1 1946-01-02 22:18:00 7.6 40.5 141.4 0 4.6 3
2 1946-01-02 23:29:00 6.7 36.1 139.4 39 4.3 2
3 1946-01-03 04:28:00 5.6 34.4 136.5 1 4.2 2
4 1946-01-05 04:36:00 6.5 35.5 139.5 50 3.0 1

然后按 Series.str.normalize 删除时间最后merge替换 DateTime 列中的缺失值:

d = df['DateTime'].dt.normalize()
dfx = pd.DataFrame({'Dates':pd.date_range(start=d.min(),
end=d.max(), freq='D')})

print (dfx)
Dates
0 1946-01-02
1 1946-01-03
2 1946-01-04
3 1946-01-05

df = dfx.merge(df.assign(Dates=d), on='Dates', how='left')
df['DateTime'] = df['DateTime'].fillna(df['Dates'])
print (df)
Dates DateTime Places w x y z col
0 1946-01-02 1946-01-02 14:45:00 6.8 36.3 140.1 31.0 3.2 1.0
1 1946-01-02 1946-01-02 22:18:00 7.6 40.5 141.4 0.0 4.6 3.0
2 1946-01-02 1946-01-02 23:29:00 6.7 36.1 139.4 39.0 4.3 2.0
3 1946-01-03 1946-01-03 04:28:00 5.6 34.4 136.5 1.0 4.2 2.0
4 1946-01-04 1946-01-04 00:00:00 NaN NaN NaN NaN NaN NaN
5 1946-01-05 1946-01-05 04:36:00 6.5 35.5 139.5 50.0 3.0 1.0

关于python - 填写数据框中缺失的日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59607157/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com