gpt4 book ai didi

python - Pandas DataFrame 中特定列的快速分割行

转载 作者:太空宇宙 更新时间:2023-11-03 16:51:55 24 4
gpt4 key购买 nike

我有以下数据框:

import pandas as pd
df = pd.DataFrame({'Probes':["1415693_at","1415693_at"],
'Genes':["Canx","LOC101056688 /// Wars "],
'cv_filter':[ 0.134,0.290],
'Organ' :["LN","LV"]} )
df = df[["Probes","Genes","cv_filter","Organ"]]

看起来像这样:

In [16]: df
Out[16]:
Probes Genes cv_filter Organ
0 1415693_at Canx 0.134 LN
1 1415693_at LOC101056688 /// Wars 0.290 LV

我想要做的是根据基因列的条目拆分行由“///”分隔。

我想要得到的结果是

       Probes                   Genes  cv_filter Organ
0 1415693_at Canx 0.134 LN
1 1415693_at LOC101056688 0.290 LV
2 1415693_at Wars 0.290 LV

我总共有大约 15 万行要检查。有没有快速的方法来处理它?<​​/p>

最佳答案

您可以先尝试str.split Genes 列,创建新的Seriesjoin其原始 df:

import pandas as pd
df = pd.DataFrame({'Probes':["1415693_at","1415693_at"],
'Genes':["Canx","LOC101056688 /// Wars "],
'cv_filter':[ 0.134,0.290],
'Organ' :["LN","LV"]} )
df = df[["Probes","Genes","cv_filter","Organ"]]
print df
Probes Genes cv_filter Organ
0 1415693_at Canx 0.134 LN
1 1415693_at LOC101056688 /// Wars 0.290 LV

s = pd.DataFrame([ x.split('///') for x in df['Genes'].tolist() ], index=df.index).stack()
#or you can use approach from comment
#s = df['Genes'].str.split('///', expand=True).stack()

s.index = s.index.droplevel(-1)
s.name = 'Genes'
print s
0 Canx
1 LOC101056688
1 Wars
Name: Genes, dtype: object

#remove original columns, because error:
#ValueError: columns overlap but no suffix specified: Index([u'Genes'], dtype='object')
df = df.drop('Genes', axis=1)

df = df.join(s).reset_index(drop=True)
print df[["Probes","Genes","cv_filter","Organ"]]
Probes Genes cv_filter Organ
0 1415693_at Canx 0.134 LN
1 1415693_at LOC101056688 0.290 LV
2 1415693_at Wars 0.290 LV

关于python - Pandas DataFrame 中特定列的快速分割行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35769055/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com