gpt4 book ai didi

python - 使用重复值对时间序列进行重新采样

转载 作者:行者123 更新时间:2023-12-01 00:26:01 24 4
gpt4 key购买 nike

我正在尝试对包含重复值的时间序列进行重新采样。我想对时间序列进行重新采样,以包含每 0.1 秒一个时间点。对于新的时间点,我希望将 NaN 值插入到这些创建的行中,并保持现有行不变。

import pandas as pd
import numpy as np

d1 = ({
'Value' : ['A','A',np.nan,np.nan,'B','B','B'],
'Other' : ['X','X',np.nan,np.nan,'X','X',np.nan],
'Col' : [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.4','2019-08-02 09:50:10.7','2019-08-02 09:50:10.7','2019-08-02 09:50:10.7','2019-08-02 09:50:10.8'],
'Count' : [1,1,np.nan,5,6,7,8],
})

df1 = pd.DataFrame(data = d1)

df1['Time'] = pd.to_datetime(df1['Time'])

df1 = (df1.set_index(['Time', df1.groupby('Time').cumcount()])
.unstack()
.asfreq('0.1S', method ='pad')
.stack()
.reset_index(level=1, drop=True)
.sort_index()
.reset_index())

输出:

                     Time Value Other  Col  Count
0 2019-08-02 09:50:10.100 A X NaN 1.0
1 2019-08-02 09:50:10.200 A X NaN 1.0
2 2019-08-02 09:50:10.300 A X NaN 1.0
3 2019-08-02 09:50:10.700 NaN NaN NaN 5.0
4 2019-08-02 09:50:10.700 B X NaN 6.0
5 2019-08-02 09:50:10.700 B X NaN 7.0
6 2019-08-02 09:50:10.800 B NaN NaN 8.0

预期输出:

                     Time Value Other    Col  Count
0 2019-08-02 09:50:10.100 A X NaN 1.0
1 2019-08-02 09:50:10.200 A X NaN 1.0
2 2019-08-02 09:50:10.300 NaN NaN NaN NaN
3 2019-08-02 09:50:10.400 NaN NaN NaN NaN
4 2019-08-02 09:50:10.500 NaN NaN NaN NaN
5 2019-08-02 09:50:10.600 NaN NaN NaN NaN
6 2019-08-02 09:50:10.700 NaN NaN NaN 5.0
7 2019-08-02 09:50:10.700 B X NaN 6.0
8 2019-08-02 09:50:10.700 B X NaN 7.0
9 2019-08-02 09:50:10.800 B NaN NaN 8.0

最佳答案

尝试使用:

df1 = (df1.set_index(['Time', df1.groupby('Time').cumcount()])
.unstack()
.asfreq('100ms', method ='pad')
.stack()
.reset_index(level=1, drop=True)
.sort_index()
.reset_index())
dr = pd.date_range(df1['Time'].iloc[0], df1['Time'].iloc[-1], freq='100ms')
df2 = pd.DataFrame({'Time': dr[~dr.isin(df1['Time'])]}, columns = df1.columns)
print(pd.concat([df1,df2]).sort_values('Time').reset_index(drop=True))

输出:

                     Time  Col  Count Other Value
0 2019-08-02 09:50:10.100 NaN 1.0 X A
1 2019-08-02 09:50:10.200 NaN 1.0 X A
2 2019-08-02 09:50:10.300 NaN 1.0 X A
3 2019-08-02 09:50:10.400 NaN NaN NaN NaN
4 2019-08-02 09:50:10.500 NaN NaN NaN NaN
5 2019-08-02 09:50:10.600 NaN NaN NaN NaN
6 2019-08-02 09:50:10.700 NaN 5.0 NaN NaN
7 2019-08-02 09:50:10.700 NaN 6.0 X B
8 2019-08-02 09:50:10.700 NaN 7.0 X B
9 2019-08-02 09:50:10.800 NaN 8.0 NaN B

如您所见,我添加了最后三行代码^,我只是创建了一个新的数据框df2,它条件不在df1中的日期时间,并分配其余列为 NaN,最后,我连接两个数据帧并按日期时间对其进行排序,然后重置索引,然后就可以了。

关于python - 使用重复值对时间序列进行重新采样,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58585588/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com