gpt4 book ai didi

python - Pandas 只保留指定的子序列(groupby order 保留子序列)

转载 作者:行者123 更新时间:2023-12-03 23:05:24 24 4
gpt4 key购买 nike

给定 Pandas 数据帧中的事件日志,其中包含一个“id”,该“id”在指定的“时间戳”处经历一系列“ Action ”——我想保留与指定的 Action 序列相对应的行。
例如输入数据

import pandas as pd 
# Create a sample data-frame from a dictionary
id = ['A123', 'A123', 'A123', 'A123', 'A123', 'A123', 'A234', 'A234', 'A234', 'A234', 'A341', 'A341', 'A341', 'A341', 'A341', 'A341', 'A341', 'A341', 'A341', 'A341']
action = ['A', 'B', 'C', 'D', 'B', 'A', 'B', 'A', 'C', 'D', 'D', 'B', 'C', 'D', 'A', 'B', 'C', 'D', 'B', 'C']
timestamp = ['1', '2', '3', '4', '5', '6', '1', '2', '3', '4', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
the_dict = {'id': id, 'action': action, 'timestamp': timestamp}
# This is the sample data-frame with columns:
# id action timestamp
# Each id when ordered by timestamp then action gives the sequence of actions taken by the id
dataFrame = pd.DataFrame(the_dict)
######################################
# Input data
######################################
# id action timestamp
#0 A123 A 1
#1 A123 B 2
#2 A123 C 3
#3 A123 D 4
#4 A123 B 5
#5 A123 A 6
#6 A234 B 1
#7 A234 A 2
#8 A234 C 3
#9 A234 D 4
#10 A341 D 1
#11 A341 B 2
#12 A341 C 3
#13 A341 D 4
#14 A341 A 5
#15 A341 B 6
#16 A341 C 7
#17 A341 D 8
#18 A341 B 9
#19 A341 C 10

# The sequence of interest
the_sequence = ['B', 'C', 'D']

# Desired output: Group by id, order by timestamp, return all rows which match the given sequence of actions
######################################
# The output data-frame:
######################################
# id action timestamp
#1 A123 B 2
#2 A123 C 3
#3 A123 D 4
#11 A341 B 2
#12 A341 C 3
#13 A341 D 4
#15 A341 B 6
#16 A341 C 7
#17 A341 D 8

最佳答案

您可以使用 .shift A 的逻辑, B , 和 C .基本上,您正在检查 A具有 B 的行和 C在接下来的行中。这将返回 A的。然后,为 B 遵循类似的协议(protocol)和 C .

df = (df[df.groupby('id')['action'].
apply(lambda x:
(x == 'B') & (x.shift(-1) == 'C') & (x.shift(-2) == 'D') |
(x == 'C') & (x.shift(1) == 'B') & (x.shift(-1) == 'D') |
(x == 'D') & (x.shift(2) == 'B') & (x.shift(1) == 'C'))])
df
输出:
    id      action  timestamp
1 A123 B 2
2 A123 C 3
3 A123 D 4
11 A341 B 2
12 A341 C 3
13 A341 D 4
15 A341 B 6
16 A341 C 7
17 A341 D 8

关于python - Pandas 只保留指定的子序列(groupby order 保留子序列),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63024764/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com