gpt4 book ai didi

python - 识别数据框中连续出现的值

转载 作者:行者123 更新时间:2023-11-28 22:39:54 25 4
gpt4 key购买 nike

考虑以下数据框 df:

import pandas as pd
d = {"A":[3, 3, 3, 2, 3, 3, 2, 2, 2, 3, 3, 2], "B": [3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 3, 3]}
df = pd.DataFrame.from_dict(d)

我有兴趣确定每列的值等于 2 的时间段。具体来说,我想打印一条消息,指示 (index) 值 2 出现的时间以及该值保持 2 多长时间(再次根据索引),忽略单次出现。所以对于上面的数据框,答案应该是这样的:

Column A: Value 2 was observed at instance 6 and continued till instance 8.
Column B: Value 2 was observed at instance 9 and continued till instance 10.

我可以用 while 和 for 循环来做到这一点,但是有没有任何 pythonic 方法可以做到这一点?感谢您的帮助。

最佳答案

你可以拆分:

import pandas as pd
d = {"A":[3, 3, 3, 2, 3, 3, 2, 2, 2, 3, 3, 2], "B": [3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 3, 3]}
df = pd.DataFrame.from_dict(d)

mask = (df == 2) & (df.shift() == 2)

inds_a = mask["A"][mask["A"]].index.values
inds_b = mask["B"][mask["B"]].index.values

for ind in [inds_a, inds_b]:
for sub in np.split(ind, np.where(np.diff(ind) != 1)[0]+1):
print("2 appeared at {} to {}".format(sub[0]-1, sub[-1]))

在拆分中获取索引和过滤器可能更快:

mask = df == 2
inds_a = mask.A[mask.A].index.values
inds_b = mask.B[mask.B].index.values


for ind in [inds_a, inds_b]:
for sub in np.split(ind, np.where(np.diff(ind) != 1)[0]+1):
if sub.size > 1:
print("2 appeared at {} to {}".format(sub[0], sub[-1]))

输出:

2 appeared at 6 to 8
2 appeared at 8 to 9

有趣的是,我发现使用 itertools.groupby 实际上是最快的:

from itertools import groupby

for k in df:
ind = prev = 0
for k, v in groupby(df[k], key=lambda x: x == 2):
ind += sum(1 for _ in v)
if k and prev + 1 != ind:
print("2 appeared at {} to {}".format(prev, ind - 1))
prev = ind

输出:

2 appeared at 6 to 8
2 appeared at 8 to 9

关于python - 识别数据框中连续出现的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34266488/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com