gpt4 book ai didi

python - 查找文本列表并在匹配字段处替换

转载 作者:太空宇宙 更新时间:2023-11-03 19:09:27 25 4
gpt4 key购买 nike

好吧,我承认标题对于我的问题来说是含糊的,我无法以更容易理解的方式表达它。我是编程新手,我的技术术语仍在发展中。

我有两个文件,文件 A 如下所示:

CHROM   POS ID  AGM12   AGM14   AGM15   AGM18 ..
1 14930 rs150145850 0/0 1/1 0/0 0/0 ..
1 14933 rs138566748 0/0 0/0 0/0 0/0 ..
1 63671 rs116440577 0/1 0/0 0/0 0/0 ..
2 808922 rs6594027 0/0 0/0 0/0 0/1 ..
2 753474 rs2073814 1/0 0/0 0/1 0/0 ..
3 753405 rs61770173 0/0 1/1 0/0 1/0 ..
...
...
...

文件B看起来像:

CHROM   POS rsID    Sample_ID
1 14930 rs150145850 AGM15
2 808922 rs6594027 AGM18
3 753405 rs61770173 AGM12
...
...
...

我希望使用文件B中的POS字段信息(第2列)来替换文件A中相应Sample_ID中的内容由NA

例如:输出应如下所示

CHROM   POS ID  AGM12   AGM14   AGM15   AGM18
1 14930 rs150145850 0/0 1/1 NA 0/0
1 14933 rs138566748 0/0 0/0 0/0 0/0
1 63671 rs116440577 0/1 0/0 0/0 0/0
2 808922 rs6594027 0/0 0/0 0/0 NA
2 753474 rs2073814 1/0 0/0 0/1 0/0
3 753405 rs61770173 NA 1/1 0/0 1/0

如何在 Python 或 Unix 中执行此操作?

最佳答案

这是一个使用 csv 的版本模块(我假设您的列是制表符分隔的)。

import csv
import collections

a = 'path/to/a'
b = 'path/to/b'
output = 'output/path'

pos = collections.defaultdict(list)

with open(b) as csvin:
reader = csv.DictReader(csvin, delimiter='\t')
for line in reader:
pos[line['POS']].append(line['Sample_ID'])

with open(a) as csvin, open(output, 'wb') as csvout:
reader = csv.DictReader(csvin, delimiter='\t')
writer = csv.DictWriter(csvout, fieldnames=reader.fieldnames, delimiter='\t')
writer.writeheader()
for line in reader:
fields = pos.get(line['POS'], [])
for field in fields:
line[field] = 'NA'
writer.writerow(line)

关于python - 查找文本列表并在匹配字段处替换,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13514234/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com