gpt4 book ai didi

python - 如何在 pandas DataFrame 中对两列进行条件复杂的 "combining"?

转载 作者:行者123 更新时间:2023-12-04 09:38:01 24 4
gpt4 key购买 nike

我有一个 pandas数据帧 df :

cit1   cgen1   cit2   cgen2   pair1   pair2

c1 male c25 female A B (+)
c2 female c25 female A B
c5 male c25 female A B
c5 male c26 male A B

c1 male c1 male A C (*)
c2 female c3 female A C

c1 male c13 male C D
c7 female c13 male C D
c8 male c17 female C D

c8 male c17 female E F
c12 male c17 female E F
...

(注意空白处为方便读者随意插入)

在这里,为了更容易理解,请处理 cit1cgen1作为一对, cit2cgen2作为一对,和 pair1pair2作为一对。

我想要的结果 DataFrame df2如下:
cit    cgen    pair1    pair2

c1 male A B (&)
c2 female A B
c5 male A B
c25 female A B (&&)
c26 male A B

c1 male A C
c2 female A C
c3 female A C

c1 male C D
c7 female C D
c8 male C D
c13 male C D
c17 female C D

c8 male E F
c12 male E F
c17 female E F
...

本质上,我想形成联合列 citcgen通过结合 cit1cit2 (对于 cit ),以及相应的 cgen1cgen2 (对于 cgen )每个唯一的 pair1 对和 pair2值(value)观。

例如, c1male来自 cit1cgen1(+)注册为 citcgen(&) .
c25female来自 cit2cgen2(+)注册为 citcgen(&&) .

也有一些情况, cit1 == cit2对于某对,由 (*) 显示.

我试过不同的功能,比如 pandas.merge() , pandas.concat() , 和 pandas.groupby() ,但似乎没有产生我打算产生的东西。 (我不一定会在这里写尝试的代码,因为它们都产生了废话。如果需要,我可以根据要求将其放在评论中。)

任何有关如何解决此问题的见解将不胜感激。

最佳答案

使用 wide_to_long 进行整形,然后通过 DataFrame.drop_duplicates 删除重复项, 按 DataFrame.sort_values 排序最后创建默认索引:

df = (pd.wide_to_long(df.reset_index(), stubnames=['cit','cgen'], i='index', j='tmp')
.reindex(['cit','cgen','pair1','pair2'], axis=1)
.drop_duplicates(["pair1", "pair2", "cgen", "cit"])
.sort_values(["pair1", "pair2", "cit"], ignore_index=True)
.reset_index(drop=True)
)
print (df)
cit cgen pair1 pair2
0 c1 male A B
1 c2 female A B
2 c25 female A B
3 c26 male A B
4 c5 male A B
5 c1 male A C
6 c2 female A C
7 c3 female A C
8 c1 male C D
9 c13 male C D
10 c17 female C D
11 c7 female C D
12 c8 male C D
13 c12 male E F
14 c17 female E F
15 c8 male E F

或者你可以 rename按子集过滤的列,按 concat 连接,删除重复和排序:
d = {'cit1':'cit','cit2':'cit','cgen1':'cgen','cgen2':'cgen'}
df = (pd.concat([df[['cit1','cgen1','pair1','pair2']].rename(columns=d),
df[['cit2','cgen2','pair1','pair2']].rename(columns=d)])
.drop_duplicates(["pair1", "pair2", "cgen", "cit"])
.sort_values(["pair1", "pair2", "cit"], ignore_index=True))
print (df)
cit cgen pair1 pair2
0 c1 male A B
1 c2 female A B
2 c25 female A B
3 c26 male A B
4 c5 male A B
5 c1 male A C
6 c2 female A C
7 c3 female A C
8 c1 male C D
9 c13 male C D
10 c17 female C D
11 c7 female C D
12 c8 male C D
13 c12 male E F
14 c17 female E F
15 c8 male E F

关于python - 如何在 pandas DataFrame 中对两列进行条件复杂的 "combining"?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62463455/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com