gpt4 book ai didi

python - 在单元格中的第一个字母之后拆分 Pandas 数据框列(一分为二)

转载 作者:太空狗 更新时间:2023-10-30 00:34:08 30 4
gpt4 key购买 nike

问题

我想将 pandas 数据框中的一列拆分为 2 列,在百分比列(见下文)中,每个条目都以大写字母字符开头,我想在这封信之后立即拆分“百分比”列,新列标记为“氨基酸”。

当前代码:

import pandas as pd

df = pd.read_csv('foo.csv')

df['Amino Acid'], df['Percentage'] = zip(*df['Percentage'].map(lambda x: x.split('[^a-zA-Z]')))

df.to_csv('bar.csv',index=False)

输入数据示例

+-----------------------------+-------+-----+-----------+---------------------------------------------------------------------------------------------+
| Species | ID | OGT | DB | Percentage |
+-----------------------------+-------+-----+-----------+---------------------------------------------------------------------------------------------+
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | E is 8.333003365670164% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | R is 6.310991522830762% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | A is 10.22668778459711% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
+-----------------------------+-------+-----+-----------+---------------------------------------------------------------------------------------------+

所需输出示例

+-----------------------------+-------+-----+-----------+------------+--------------------------------------------------------------------------------------------+
| Species | ID | OGT | DB | Amino Acid | Percentage |
+-----------------------------+-------+-----+-----------+------------+--------------------------------------------------------------------------------------------+
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | E | is 8.333003365670164% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | R | is 6.310991522830762% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | A | is 10.22668778459711% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
+-----------------------------+-------+-----+-----------+------------+--------------------------------------------------------------------------------------------+

最佳答案

可以直接提取第一个字母:

df['Amino Acid'] = df['Percentage'].str[0]
df['Percentage'] = df['Percentage'].str[1:]

关于python - 在单元格中的第一个字母之后拆分 Pandas 数据框列(一分为二),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51243702/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com