gpt4 book ai didi

python - 在 for 循环中附加数据框

转载 作者:行者123 更新时间:2023-11-28 18:29:07 24 4
gpt4 key购买 nike

如果我有一个包含三列的 pd 数据框:idstart_timeend_time,我想将其转换为 pd .df 包含两列:idtime

例如从 [001, 1, 3][002, 3, 4][001, 1][001, 2][001, 3][002, 3][002, 4 ]

目前,我正在使用 for 循环并在每次迭代中附加数据帧,但速度非常慢。有没有其他方法可以节省时间?

最佳答案

如果 start_timeend_timetimedelta 使用:

df = pd.DataFrame([['001', 1, 3],['002', 3, 4]], 
columns=['id','start_time','end_time'])
print (df)
id start_time end_time
0 001 1 3
1 002 3 4

#stack columns
df1 = pd.melt(df, id_vars='id', value_name='time').drop('variable', axis=1)
#convert int to timedelta
df1['time'] = pd.to_timedelta(df1.time, unit='s')
df1.set_index('time', inplace=True)
print (df1)
id
time
00:00:01 001
00:00:03 002
00:00:03 001
00:00:04 002

#groupby by id and resample by one second
print (df1.groupby('id')
.resample('1S')
.ffill()
.reset_index(drop=True, level=0)
.reset_index())

time id
0 00:00:01 001
1 00:00:02 001
2 00:00:03 001
3 00:00:03 002
4 00:00:04 002

如果 start_timeend_timedatetime 使用:

df = pd.DataFrame([['001', '2016-01-01', '2016-01-03'],
['002', '2016-01-03', '2016-01-04']],
columns=['id','start_time','end_time'])
print (df)
id start_time end_time
0 001 2016-01-01 2016-01-03
1 002 2016-01-03 2016-01-04

df1 = pd.melt(df, id_vars='id', value_name='time').drop('variable', axis=1)
#convert to datetime
df1['time'] = pd.to_datetime(df1.time)
df1.set_index('time', inplace=True)
print (df1)
id
time
2016-01-01 001
2016-01-03 002
2016-01-03 001
2016-01-04 002

#groupby by id and resample by one day
print (df1.groupby('id')
.resample('1D')
.ffill()
.reset_index(drop=True, level=0)
.reset_index())

time id
0 2016-01-01 001
1 2016-01-02 001
2 2016-01-03 001
3 2016-01-03 002
4 2016-01-04 002

关于python - 在 for 循环中附加数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38910642/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com