我有三个相关列:时间、ID 和交互。我如何创建一个新列,其 id 值在给定时间窗口中的“交互”列中为“1”?
应该看起来像这样:
time id vec_len quadrant interaction Paired with
1 3271 0.9 7 0
1 3229 0.1 0 0
1 4228 0.5 0 0
1 2778 -0.3 5 0
2 4228 0.2 0 0
2 3271 0.1 6 0
2 3229 -0.7 5 1 [2778, 4228]
2 3229 -0.3 2 0
2 4228 -0.8 5 1 [2778, 3229]
2 2778 -0.6 5 1 [4228, 3229]
3 4228 0.2 0 0
3 3271 0.1 6 0
3 4228 -0.7 5 1 [3271]
3 3229 -0.3 2 0
3 3271 -0.8 5 1 [4228]
谢谢你的帮助!!
import numpy as np
# initialize dict for all time blocks
dict_time_ids = dict.fromkeys(df.time.unique(), set())
# populate dictionary with ids for each time block where interaction == 1
dict_time_ids.update(df.query('interaction == 1').groupby('time').id.apply(set).to_dict())
# make new column with set of corresponding ids where interaction == 1
df['paired'] = np.where(df.interaction == 1, df.time.apply(lambda x: dict_time_ids[x]), set())
# remove the id from the set and convert to list
df.paired = df.apply(lambda x: list(x.paired - {x.id}), axis=1)
# Out:
time id interaction paired
0 1 3271 0 []
1 1 3229 0 []
2 1 4228 0 []
3 1 2778 0 []
4 2 4228 0 []
5 2 3271 0 []
6 2 3229 1 [2778, 4228]
7 2 3229 0 []
8 2 4228 1 [2778, 3229]
9 2 2778 1 [4228, 3229]
10 3 4228 0 []
11 3 3271 0 []
12 3 4228 1 [3271]
13 3 3229 0 []
14 3 3271 1 [4228]
我是一名优秀的程序员,十分优秀!