gpt4 book ai didi

python - 仅当存在公共(public)索引时如何组合两个数据帧,否则保留空单元格

转载 作者:太空宇宙 更新时间:2023-11-03 17:06:26 25 4
gpt4 key购买 nike

我有两个文件:文件1.txt:

ID  Gene    ShortName   TSSA   ENS1S   Gm16088 TSS82763B   ENS2S   Gm26206 TSS81070C   ENS3S   Rp1 TSS11475D   ENS4S   Gm22848 TSS18078E   ENS5S   Sox17   TSS56047,TSS74369

file2.txt:

ID  Type    ConditionB   Normal  2J   Cancer  1K   Cancer  2A   Normal  3

My desired output is:file1.txt then add the values from file2 that match the first column only:

ID  Gene    ShortName   TSS Type    ConditionA   ENS1S   Gm16088 TSS82763    Normal  3B   ENS2S   Gm26206 TSS81070    Normal  2C   ENS3S   Rp1 TSS11475        D   ENS4S   Gm22848 TSS18078    E   ENS5S   Sox17   TSS56047,TSS74369       

hence, the Type and Condition columns of file2.txt will be added. if value is in file1 but not in file2, it will be replaced by just empty cell. if value is in file2 but not file1, it will be ignored.here is what I tried so far and it is not working:Inputting 2 data frames then trying to use data merge or join:

 df1 =  pd.read_csv("file1.txt", index_col=0, sep="\t")
df2 = pd.read_csv("file2.txt", index_col=0, sep="\t")

result2 = pd.merge(df1, df2, on=df1.index, how ="left")
result2.to_csv("Merged.xls", sep="\t")

我还尝试了 pd.concat 与轴 1,但这也不起作用。

然后我尝试了:

  with open('file1.txt') as f:
r = csv.reader(f, delimiter='\t')
dict1 = {row[0]: row for row in r}

with open('file2.txt') as f:
r = csv.reader(f, delimiter='\t')
dict2= {row[0]: row for row in r}

keys = set(dict1.keys() + dict2.keys()) #i saw this on stackoverlow, i am not sure why it is sorting the keys by alphabetical order and i am unable to unsort (any side tip on that?)

with open('output.csv', 'wb') as f:
w = csv.writer(f, delimiter='\t')
w.writerows([[key, '\t',dict1.get(key),'\t', dict2.get(key)]
for key in keys])

这也没有给出所需的输出,并且字符串之间有很多“'”。有什么建议的方法吗?我知道如何合并到数据帧,如果它具有相同的行数和索引,但如果我只想使用第一个文件作为标准索引,我就无法做到这一点。我知道如何在 R 中使用合并函数然后 by.x 和 by.y 来完成此操作,但是 R 弄乱了我所有的 header 名称(上面的只是一个示例)。所以最好用Python来做。

最佳答案

使用 sep='\t' 读取文件无法正确解析,但 sep='\s+' 可以正确解析示例行,然后标准 merge 给出您想要的结果:

df1 = pd.read_csv('text1.txt', sep='\s+')
df2 = pd.read_csv('text2.txt', sep='\s+')
df1.merge(df2, on='ID', how='left')

ID Gene ShortName TSS Type Condition
0 A ENS1S Gm16088 TSS82763 Normal 3
1 B ENS2S Gm26206 TSS81070 Normal 2
2 C ENS3S Rp1 TSS11475 NaN NaN
3 D ENS4S Gm22848 TSS18078 NaN NaN
4 E ENS5S Sox17 TSS56047,TSS74369 NaN NaN

您当然也可以将“ID”移动到index并使用.join().concat().merge(left_index=True, right_index=True) 并为每个 left 合并进行适当的设置。

关于python - 仅当存在公共(public)索引时如何组合两个数据帧,否则保留空单元格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34550313/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com