gpt4 book ai didi

python - 将多个时间戳行折叠为一个

转载 作者:行者123 更新时间:2023-12-04 08:19:13 25 4
gpt4 key购买 nike

我有一个这样的系列:

s = pd.DataFrame({'ts': [1, 2, 3, 6, 7, 11, 12, 13]})
s

ts
0 1
1 2
2 3
3 6
4 7
5 11
6 12
7 13
我想折叠差异小于 MAX_DIFF (2) 的行。这意味着所需的输出必须是:
[{'ts_from': 1, 'ts_to': 3},
{'ts_from': 6, 'ts_to': 7},
{'ts_from': 11, 'ts_to': 13}]
我做了一些编码:
s['close'] = s.diff().shift(-1)
s['close'] = s[s['close'] > MAX_DIFF].astype('bool')
s['close'].iloc[-1] = True

parts = []
ts_from = None

for _, row in s.iterrows():
if row['close'] is True:
part = {'ts_from': ts_from, 'ts_to': row['ts']}
parts.append(part)
ts_from = None
continue

if not ts_from:
ts_from = row['ts']
这有效,但由于 iterrows() 似乎不是最佳的。我想过排名,但无法弄清楚如何实现它们以便进一步分组排名。
有没有办法优化算法?

最佳答案

您可以通过检查差异超过阈值的位置来创建组并进行累计。然后 agg 随心所欲,也许 firstlast在这种情况下。

gp = s['ts'].diff().abs().ge(2).cumsum().rename(None)
res = s.groupby(gp).agg(ts_from=('ts', 'first'),
ts_to=('ts', 'last'))
# ts_from ts_to
#0 1 3
#1 6 7
#2 11 13
如果你想要字典列表,那么:
res.to_dict('records')
#[{'ts_from': 1, 'ts_to': 3},
# {'ts_from': 6, 'ts_to': 7},
# {'ts_from': 11, 'ts_to': 13}]

为了完整起见,这里是 grouper 与 DataFrame 对齐的方式:
s['gp'] = gp
print(s)

ts gp
0 1 0 # `1` becomes ts_from for group 0
1 2 0
2 3 0 # `3` becomes ts_to for group 0
3 6 1 # `6` becomes ts_from for group 1
4 7 1 # `7` becomes ts_to for group 1
5 11 2 # `11` becomes ts_from for group 2
6 12 2
7 13 2 # `13` becomes ts_to for group 2

关于python - 将多个时间戳行折叠为一个,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65568995/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com