gpt4 book ai didi

Python Pandas 重采样

转载 作者:太空宇宙 更新时间:2023-11-04 05:41:03 25 4
gpt4 key购买 nike

我有以下数据框:

    Timestamp    S_time1   S_time2   End_Time_1   End_time_2   Sign_1   Sign_2
0 2413044 0 0 0 0 x x
1 2422476 0 0 0 0 x x
2 2431908 0 0 0 0 x x
3 2441341 0 0 0 0 x x
4 2541232 2526631 2528631 2520631 2530631 10 80
5 2560273 2544946 2546496 2546496 2548496 40 80
6 2577224 2564010 2566010 2566010 2568010 null null
7 2592905 2580959 2582959 2582959 2584959 null null

table 就这样一直摆下去。第一列是以毫秒为单位的时间戳。 S_time1End_time_1 是特定标志(数字)出现的持续时间。比如我们取第5行,S_time1为2526631,End_time_1为2520631,对应的sign_1为10,表示从2526631开始到 2520631 将显示标志 10。 S_time2End_time_2 也是如此。 sign_2中的相应值将出现在从S_time2End_time_2的持续时间内。

我想以 100 毫秒的 bin 时间对索引列 (Timestamp) 重新采样,并检查符号属于哪个 bin 时间。例如,每个开始时间和结束时间之间有 2000 毫秒的差异。所以对应的标志号会在连续20个左右的bin时间重复出现,因为每个bin时间是100毫秒。所以我只需要两列:一列是垃圾时间,第二列是标志。看起来像下表:(我只是凑个bin time来举例)

Bin_time   signs
...100 0
...200 0
...300 10
...400 10
...500 10
...600 10

符号10代表对应的S_time1到End_time_1的持续时间。然后下一个符号 80 在 S_time2 到 End_time_2 的持续时间内继续。我不确定这是否可以在 Pandas 中完成。但我真的需要 pandas 或其他方法的帮助。

提前感谢您的帮助和建议。

最佳答案

输入:

print df
Timestamp S_time1 S_time2 End_Time_1 End_time_2 Sign_1 Sign_2
0 2413044 0 0 0 0 x x
1 2422476 0 0 0 0 x x
2 2431908 0 0 0 0 x x
3 2441341 0 0 0 0 x x
4 2541232 2526631 2528631 2520631 2530631 10 80
5 2560273 2544946 2546496 2546496 2548496 40 80
6 2577224 2564010 2566010 2566010 2568010 null null
7 2592905 2580959 2582959 2582959 2584959 null null

2 种方法:

In [231]: %timeit s(df)
1 loops, best of 3: 2.78 s per loop

In [232]: %timeit m(df)
1 loops, best of 3: 690 ms per loop
def m(df):
#resample column Timestamp by 100ms, convert bak to integers
df['Timestamp'] = df['Timestamp'].astype('timedelta64[ms]')
df['i'] = 1
df = df.set_index('Timestamp')
df1 = df[[]].resample('100ms', how='first').reset_index()
df1['Timestamp'] = (df1['Timestamp'] / np.timedelta64(1, 'ms')).astype(int)
#felper column i for merging
df1['i'] = 1
#print df1

out = df1.merge(df,on='i', how='left')
out1 = out[['Timestamp', 'Sign_1']][(out.Timestamp >= out.S_time1) & (out.Timestamp <= out.End_Time_1)]
out2 = out[['Timestamp', 'Sign_2']][(out.Timestamp >= out.S_time2) & (out.Timestamp <= out.End_time_2)]

out1 = out1.rename(columns={'Sign_1':'Bin_time'})
out2 = out2.rename(columns={'Sign_2':'Bin_time'})

df = pd.concat([out1, out2], ignore_index=True).drop_duplicates(subset='Timestamp')
df1 = df1.set_index('Timestamp')
df = df.set_index('Timestamp')
df = df.reindex(df1.index).reset_index()
#print df.head(10)
def s(df):
#resample column Timestamp by 100ms, convert bak to integers
df['Timestamp'] = df['Timestamp'].astype('timedelta64[ms]')
df = df.set_index('Timestamp')
out = df[[]].resample('100ms', how='first')
out = out.reset_index()
out['Timestamp'] = (out['Timestamp'] / np.timedelta64(1, 'ms')).astype(int)
#print out.head(10)

#search start end
def search(x):
mask1 = (df.S_time1<=x['Timestamp']) & (df.End_Time_1>=x['Timestamp'])
#if at least one True return first value of series
if mask1.any():
return df.loc[mask1].Sign_1[0]
#check second start and end time
else:
mask2 = (df.S_time2<=x['Timestamp']) & (df.End_time_2>=x['Timestamp'])
if mask2.any():
#if at least one True return first value
return df.loc[mask2].Sign_2[0]
else:
#if all False return NaN
return np.nan

out['Bin_time'] = out.apply(search, axis=1)
#print out.head(10)

关于Python Pandas 重采样,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33898674/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com