gpt4 book ai didi

python - 从数据集中过滤掉连续的行

转载 作者:太空宇宙 更新时间:2023-11-04 09:56:55 25 4
gpt4 key购买 nike

我有一个包含索引值和一个日期时间变量的数据集,如下所示:

1      2017-01-03 09:30:01.958
46 2017-01-03 09:30:47.879
99 2017-01-03 09:33:48.121
117 2017-01-03 09:47:06.215
139 2017-01-03 09:51:06.054
1567 2017-01-03 14:17:18.949
2480 2017-01-03 15:57:13.442
2481 2017-01-03 15:57:14.333
2486 2017-01-03 15:57:37.500
2487 2017-01-03 15:57:38.677
2489 2017-01-03 15:57:41.053
2491 2017-01-03 15:57:54.870
2498 2017-01-03 15:59:24.210

我想做的是从数据中删除连续的行(只保留段中的第一个观察值),在这种情况下,代码应该删除索引为 2481 和 2487 的行。我尝试使用

df[df.index.diff() == 0].drop()

但它只返回

AttributeError: 'Int64Index' object has no attribute 'diff'

最佳答案

您可以使用 boolean indexing , 使用未实现的方法使用 index to_series :

df = df[df.index.to_series().diff() != 1]
print (df)
date
1 2017-01-03 09:30:01.958
46 2017-01-03 09:30:47.879
99 2017-01-03 09:33:48.121
117 2017-01-03 09:47:06.215
139 2017-01-03 09:51:06.054
1567 2017-01-03 14:17:18.949
2480 2017-01-03 15:57:13.442
2486 2017-01-03 15:57:37.500
2489 2017-01-03 15:57:41.053
2491 2017-01-03 15:57:54.870
2498 2017-01-03 15:59:24.210

谢谢piRSquared对于 numpy 替代方案:

df[np.append(0, np.diff(df.index.values)) != 1]

时间:

#[11000 rows x 1 columns]
df = pd.concat([df]*1000)

In [60]: %timeit [True] + [(i[0]+1) != i[1] for i in zip(df.index.tolist(), df.index.tolist()[1:])]
100 loops, best of 3: 4.19 ms per loop

In [61]: %timeit np.append(0, np.diff(df.index.values)) != 1
The slowest run took 4.72 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 33.1 µs per loop

In [62]: %timeit df.index.to_series().diff() != 1
1000 loops, best of 3: 260 µs per loop

关于python - 从数据集中过滤掉连续的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45429440/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com