gpt4 book ai didi

Python 根据条件创建 ID 的组合

转载 作者:行者123 更新时间:2023-12-01 08:18:09 25 4
gpt4 key购买 nike

您好,我想创建 ID 的组合。我知道如何创建所有可能的组合,但卡在操作的最后一部分上。任何帮助将不胜感激。

我有一个数据集如下:

将 pandas 导入为 pd从itertools导入combinations_with_replacement

d1 = {'Subject': ['Subject1','Subject1','Subject1','Subject2','Subject2','Subject2','Subject3','Subject3','Subject3','Subject4','Subject4','Subject4','Subject5','Subject5','Subject5'],
'Actual':['1','0','0','0','0','1','0','1','0','0','0','0','1','0','1'],
'Event':['1','2','3','1','2','3','1','2','3','1','2','3','1','2','3'],
'Category':['1','1','2','1','1','2','2','2','2','1','1','1','1','2','1'],
'Variable1':['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15'],
'Variable2':['12','11','10','9','8','7','6','5','4','3','2','1','-1','-2','-3'],
'Variable3': ['-6','-5','-4','-3','-4','-3','-2','-1','0','1','2','3','4','5','6']}
d1 = pd.DataFrame(d1)

我想在每个层的每个事件中创建所有可能的主题组合。这是通过(来自上一个问题 Form groups of individuals python (pandas) )完成的:

L = [(i[0], i[1], y[0], y[1]) for i, x in d1.groupby(['Event','Category'])['Subject'] 
for y in list(combinations_with_replacement(x, 2))]
df = pd.DataFrame(L, columns=['Event','Category','Subject_IDcol1','Subject_IDcol2'])

现在,我想获取 Actual = 1 的所有对,并随机选择 Actual = 0 的“n”个受试者。为了简单起见,我们取 n = 1。我想运行该函数 Combinations_with_replacement在此新列表中。

我想要获得的输出例如(假设随机选择)是这样的:

对于事件 1,类别 1:受试者 1 和 5 的实际值 = 1,并假设受试者 2 是随机抽取的。

enter image description here

与此相比,在之前的情况下,结果是这样的(对于事件 =1 和类别 =1)

enter image description here

任何帮助将不胜感激。谢谢。

最佳答案

我认为这是做你想做的事情的一种方法:

import itertools
import pandas as pd
import numpy as np

d1 = {
'Subject': ['Subject1', 'Subject1', 'Subject1', 'Subject2', 'Subject2', 'Subject2',
'Subject3', 'Subject3', 'Subject3', 'Subject4', 'Subject4', 'Subject4',
'Subject5', 'Subject5', 'Subject5'],
'Actual': ['1', '0', '0', '0', '0', '1', '0', '1', '0', '0', '0', '0', '1', '0', '1'],
'Event': ['1', '2', '3', '1', '2', '3', '1', '2', '3', '1', '2', '3', '1', '2', '3'],
'Category': ['1', '1', '2', '1', '1', '2', '2', '2', '2', '1', '1', '1', '1', '2', '1'],
'Variable1': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15'],
'Variable2': ['12', '11', '10', '9', '8', '7', '6', '5', '4', '3', '2', '1', '-1', '-2', '-3'],
'Variable3': ['-6', '-5', '-4', '-3', '-4', '-3', '-2', '-1', '0', '1', '2', '3', '4', '5', '6']
}
d1 = pd.DataFrame(d1)
num_nonactual = 1

np.random.seed(100)
# First leave only up to num_nonactual subjects with actual != '1' for each event/category
g1 = d1.groupby(['Event', 'Category', 'Actual'], group_keys=False)
d2 = g1.apply(lambda x: x if x.name[2] == '1' else x.sample(min(num_nonactual, len(x))))
# Then do the same as before
d2.sort_values('Subject', inplace=True)
L = [(i1, i2, y1, y2)
for (i1, i2), x in d2.groupby(['Event', 'Category'])['Subject']
for y1, y2 in itertools.combinations_with_replacement(x, 2)]
df = pd.DataFrame(L, columns=['Event', 'Category', 'Subject_IDcol1', 'Subject_IDcol2'])
print(df)

输出:

   Event Category Subject_IDcol1 Subject_IDcol2
0 1 1 Subject1 Subject1
1 1 1 Subject1 Subject4
2 1 1 Subject1 Subject5
3 1 1 Subject4 Subject4
4 1 1 Subject4 Subject5
5 1 1 Subject5 Subject5
6 1 2 Subject3 Subject3
7 2 1 Subject2 Subject2
8 2 2 Subject3 Subject3
9 2 2 Subject3 Subject5
10 2 2 Subject5 Subject5
11 3 1 Subject4 Subject4
12 3 1 Subject4 Subject5
13 3 1 Subject5 Subject5
14 3 2 Subject2 Subject2
15 3 2 Subject2 Subject3
16 3 2 Subject3 Subject3

关于Python 根据条件创建 ID 的组合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54858893/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com