gpt4 book ai didi

python - 折叠共享一列值的 python 数据框行

转载 作者:行者123 更新时间:2023-11-28 20:59:33 25 4
gpt4 key购买 nike

我觉得必须有一种非常直接的方法来做到这一点,但我找不到。

所以,我有这个数据(注意 description 列在几个之间有共享部分):

import pandas as pd

data = {"description": ["AAAA:A", "AAAA:B", "AAAA:C", "AAAA:D", "BBBB:A", "BBBB:B"],
"sequence": ["AAAAAAAAAAA", "AAAAAAABBBBBB", "AAAAAAAACCCCCCC", "AAAAAAAADDDDDDD",
"BBBBBBAAAAA", "BBBBBBBBBBBBB"]}

df = pd.DataFrame(data)
print df

# description sequence
#0 AAAA:A AAAAAAAAAAA
#1 AAAA:B AAAAAAABBBBBB
#2 AAAA:C AAAAAAAACCCCCCC
#3 AAAA:D AAAAAAAADDDDDDD
#4 BBBB:A BBBBBBAAAAA
#5 BBBB:B BBBBBBBBBBBBB

我的最终目标是将所有序列放在一起,形成一个 4 字母的描述。像这样:

#  description   sequence_A     sequence_B       sequence_C       sequence_D
#0 AAAA AAAAAAAAAAA AAAAAAABBBBBB AAAAAAAACCCCCCC AAAAAAAADDDDDDD
#1 BBBB BBBBBBAAAAA BBBBBBBBBBBBB NaN NaN

到目前为止,我已经到了这一点:

df = df.apply(lambda row: pd.Series({"description": row["description"].split(":")[0],
"sequence_{}".format(row["description"].split(":")[1]): row["sequence"]}),
axis=1)
print df

# description sequence_A sequence_B sequence_C sequence_D
#0 AAAA AAAAAAAAAAA NaN NaN NaN
#1 AAAA NaN AAAAAAABBBBBB NaN NaN
#2 AAAA NaN NaN AAAAAAAACCCCCCC NaN
#3 AAAA NaN NaN NaN AAAAAAAADDDDDDD
#4 BBBB BBBBBBAAAAA NaN NaN NaN
#5 BBBB NaN BBBBBBBBBBBBB NaN NaN

我猜我需要 df.groupby("description") 然后再执行一步,但我遗漏了最后一点。

最佳答案

split 然后 pivot

df[['New1','New2']]=df.description.str.split(':',expand=True)
s=df[['New1','New2','sequence']]

s.pivot(*s.columns).add_prefix('sequence_')

Out[863]:
New2 sequence_A sequence_B sequence_C sequence_D
New1
AAAA AAAAAAAAAAA AAAAAAABBBBBB AAAAAAAACCCCCCC AAAAAAAADDDDDDD
BBBB BBBBBBAAAAA BBBBBBBBBBBBB None None

关于python - 折叠共享一列值的 python 数据框行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49391846/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com