gpt4 book ai didi

Python - 匹配模式,打印模式及其后面的n行

转载 作者:行者123 更新时间:2023-12-01 09:33:10 26 4
gpt4 key购买 nike

我有一个这样的文件(包含+10000个序列,+98000行):

>DILT_0000000001-mRNA-1
MKVVKICSKLRKFIESRKDAVLPEQEEVLADLWAFEGISEFQMERFAKAAQCFQHQYELA
IKANLTEHASRSLENLGRARARLYDYQGALDAWTKRLDYEIKGIDKAWLHHEIGRAYLEL
NQYEEAIDHAATARDVADREADMEWDLNATVLIAQAHFYAGNLEEAKVYFEAAQNAAFRK
GFFKAESVLAEAIAEVDSEIRREEAKQERVYTKHSVLFNEFSQRAVWSEEYSEELHLFPF
AVVMLRCVLARQCTVHLQFRSCYNL
>DILT_0000000101-mRNA-1
MSCRRLSMNPGEALIKESSAPSRENLLKPYFDEDRCKFRHLTAEQFSDIWSHFDLDGVNE
LRFILRVPASQQAGTGLRFFGYISTEVYVHKTVKVSYIGFRKKNNSRALRRWNVNKKCSN
AVQMCGTSQLLAIVGPHTQPLTNKLCHTDYLPLSANFA
>DILT_0001999301-mRNA-1
LEHGIQPDGQMPSDKTIGGGDDSFQTFFSETGAGKHVPRAVMVDLEPTVIGEYLCVLLTS
FILFRLISTNLGPNSQLASRTLLFAADKTTLFRLLGLLPWSLLKIAVQ
>DILT_0001999401-mRNA-1
MAENGEDANMPEEGKEGNTQDQGEHQQDVQSDEPNEADSGYSSAASSDVNSQTIPITVIL
PNREAVNLSFDPNISVSELQERLNGPGITRLNENLFFTYSGKQLDPNKTLLDYKVQKSST
LYVHETPTALPKSAPNAKEEGVVPSNCLIHSGSRMDENRCLKEYQLTQNSVIFVHRPTAN
TAVQNREEKTSSLEVTVTIRETGNQLHLPINPHXXXXTVEMHVAPGVTVGDLNRKIAIKQ

所有带“>”的行都是 ID。以下几行是有关 ID 的序列。

我还有一个文件,其中包含我想要的序列 ID,例如:

DILT_0000000001-mRNA-1
DILT_0000000101-mRNA-1
DILT_0000000201-mRNA-1
DILT_0000000301-mRNA-1
DILT_0000000401-mRNA-1
DILT_0000000501-mRNA-1
DILT_0000000601-mRNA-1
DILT_0000000701-mRNA-1
DILT_0000000801-mRNA-1
DILT_0000000901-mRNA-1

我想编写一个脚本来匹配 ID 并复制该 ID 的序列,但我只是获取 ID,而没有序列。

seqs = open('WBPS10.protein.fa').readlines()
ids = open('ids.txt').readlines()

for line in ids:
for record in seqs:
if line == record[1:]:
print record

我不知道要写什么来获取 ID 后的“n”行,因为有时它是 2 行,其他序列有更多,正如您在我的示例中看到的那样。

问题是,我试图在不使用 Biopython 的情况下做到这一点,这会容易得多。我只是想学习其他方法。

最佳答案

seqs_by_ids = {}
with open('WBPS10.protein.fa', 'r') as read_file:
for line in read_file.readlines():
if line.startswith('>'):
current_key = line[1:].strip()
seqs_by_ids[current_key] = ''
else:
seqs_by_ids[current_key] += line.strip()

ids = set([line.strip() for line in open('ids.txt').readlines()])

for id in ids:
if id in seqs_by_ids:
print(id)
print('\t{}'.format(seqs_by_ids[id]))

输出:

DILT_0000000001-mRNA-1
MKVVKICSKLRKFIESRKDAVLPEQEEVLADLWAFEGISEFQMERFAKAAQCFQHQYELAIKANLTEHASRSLENLGRARARLYDYQGALDAWTKRLDYEIKGIDKAWLHHEIGRAYLELNQYEEAIDHAATARDVADREADMEWDLNATVLIAQAHFYAGNLEEAKVYFEAAQNAAFRKGFFKAESVLAEAIAEVDSEIRREEAKQERVYTKHSVLFNEFSQRAVWSEEYSEELHLFPFAVVMLRCVLARQCTVHLQFRSCYNL
DILT_0000000101-mRNA-1
MSCRRLSMNPGEALIKESSAPSRENLLKPYFDEDRCKFRHLTAEQFSDIWSHFDLDGVNELRFILRVPASQQAGTGLRFFGYISTEVYVHKTVKVSYIGFRKKNNSRALRRWNVNKKCSNAVQMCGTSQLLAIVGPHTQPLTNKLCHTDYLPLSANFA

关于Python - 匹配模式,打印模式及其后面的n行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49777220/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com