gpt4 book ai didi

python - 使用 itertools、melt 和 groupby 正确地使用 Pandas 计算每个属性值的事件对

转载 作者:行者123 更新时间:2023-12-03 16:47:12 28 4
gpt4 key购买 nike

我有以下格式的表格

  Id   |   Sequence   |   Attribute A  |  Attribute B |
ID1 [A,B,C,D] A1 B1
ID2 [A,B,F,G] A2 B3
ID3 [A,B,C,D] A1 B1
我想为每个事件组合和属性值计算唯一 ID 的数量。
决赛 table 应该看起来像
  Pair    |  Attribute Type | Attribute Value   | ID Count
(A,B) Attribute A A1 2 #Event A happens before event B in 2 unique ID's where A1 is the value of Attribute A.
(A,C) Attribute A A1 2
(A,D) Attribute A A1 2
(B,C) Attribute A A1 2
(B,D) Attribute A A1 2
(C,D) Attribute A A1 2
(A,B) Attribute A A2 1
(A,F) Attribute A A2 1
(A,G) Attribute A A2 1
(B,F) Attribute A A2 1
(B,G) Attribute A A2 1
(F,G) Attribute A A2 1
(A,B) Attribute B B1 2
(A,C) Attribute B B1 2
(A,D) Attribute B B1 2
(B,C) Attribute B B1 2
(B,D) Attribute B B1 2
(C,D) Attribute B B1 2
(A,B) Attribute B B3 1
(A,F) Attribute B B3 1
(A,G) Attribute B B3 1
(B,F) Attribute B B3 1
(B,G) Attribute B B3 1
(F,G) Attribute B B3 1
这样做的正确方法是什么?实际上,我将拥有的不仅仅是 2 个属性。
这是我走了多远
 df['Sequence Combs'] = df['Sequence'].apply(lambda x: list(itertools.combinations(x,2)))


Id | Sequence | Event Combs | Attribute A | Attribute B |
ID1 [A,B,C,D] [(A,B),(A,C),(A,D),(B,C),(B,D),(C,D)] A1 B1
ID2 [A,B,F,G] [(A,B),(A,F),(A,G),(B,F),(B,G),(F,G)] A2 B3
ID3 [A,B,C,D] [(A,B),(A,C),(A,D),(B,C),(B,D),(C,D)] A1 B1
并且在做爆炸之后
df = df.explode('Sequence Combs')
我得到以下
  Id   |   Sequence   |  Event Combs |  Attribute A  |  Attribute B |
ID1 [A,B,C,D] (A,B) A1 B1
ID1 [A,B,C,D] (A,C) A1 B1
ID1 [A,B,C,D] (A,D) A1 B1
ID1 [A,B,C,D] (B,C) A1 B1
ID1 [A,B,C,D] (B,D) A1 B1
ID1 [A,B,C,D] (C,D) A1 B1
... ... .. .. ..
但我不确定如何从这里开始,有什么想法吗?

最佳答案

你可以这样做:

from itertools import combinations

# create function for creating a list the 2-combinations
combs = lambda x: list(combinations(x, r=2))

# create new DataFrame with now the Sequence column is the list of the 2-combinations
res = df.assign(seq=df['Sequence'].apply(combs)).drop('Sequence', axis=1).rename(columns={'seq' : 'Sequence'})

# explode, then melt
res = res.explode('Sequence').melt(id_vars=['Id', 'Sequence'], var_name='Attribute Type', value_name='Attribute Value')

# finally group by all the columns but Id, and count
res = res.groupby(['Sequence', 'Attribute Type', 'Attribute Value'])['Id'].count()

print(res)
输出
Sequence  Attribute Type  Attribute Value
(A, B) Attribute A A1 2
A2 1
Attribute B B1 2
B3 1
(A, C) Attribute A A1 2
Attribute B B1 2
(A, D) Attribute A A1 2
Attribute B B1 2
(A, F) Attribute A A2 1
Attribute B B3 1
(A, G) Attribute A A2 1
Attribute B B3 1
(B, C) Attribute A A1 2
Attribute B B1 2
(B, D) Attribute A A1 2
Attribute B B1 2
(B, F) Attribute A A2 1
Attribute B B3 1
(B, G) Attribute A A2 1
Attribute B B3 1
(C, D) Attribute A A1 2
Attribute B B1 2
(F, G) Attribute A A2 1
Attribute B B3 1
Name: Id, dtype: int64
如果您想 匹配预期的输出,执行:
# finally group by all the columns but Id, and count
res = res.groupby(['Sequence', 'Attribute Type', 'Attribute Value'], as_index=False)['Id'].count().rename({'Id' : 'Id Count'}).sort_values('Attribute Type')

print(res)
输出
   Sequence Attribute Type Attribute Value  Id
0 (A, B) Attribute A A1 2
1 (A, B) Attribute A A2 1
20 (C, D) Attribute A A1 2
4 (A, C) Attribute A A1 2
6 (A, D) Attribute A A1 2
18 (B, G) Attribute A A2 1
8 (A, F) Attribute A A2 1
10 (A, G) Attribute A A2 1
22 (F, G) Attribute A A2 1
12 (B, C) Attribute A A1 2
16 (B, F) Attribute A A2 1
14 (B, D) Attribute A A1 2
21 (C, D) Attribute B B1 2
19 (B, G) Attribute B B3 1
17 (B, F) Attribute B B3 1
11 (A, G) Attribute B B3 1
13 (B, C) Attribute B B1 2
9 (A, F) Attribute B B3 1
7 (A, D) Attribute B B1 2
5 (A, C) Attribute B B1 2
3 (A, B) Attribute B B3 1
2 (A, B) Attribute B B1 2
15 (B, D) Attribute B B1 2
23 (F, G) Attribute B B3 1

关于python - 使用 itertools、melt 和 groupby 正确地使用 Pandas 计算每个属性值的事件对,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65296068/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com