gpt4 book ai didi

python - 组合字符串,提取子串

转载 作者:太空宇宙 更新时间:2023-11-04 01:22:53 25 4
gpt4 key购买 nike

(我用的是python)

我正在处理一个大的 RNA 序列文件,我正在尝试重新格式化它以便在聚类程序中使用。我的文件由两种类型的“线条”组成。 1) 细菌的登录号,(句点)该序列开始的核苷酸,(句点)它结束的核苷酸。 2) 实际序列本身的行(跨越多行,即使它是一个连续的序列):

>A45315.1.1521\n GACGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAGCGCAGGAAGCCGGCGGAUCCC\n UUCGGGGUGAANCCGGUGGAAUGAGCGGCGGACGGGUGAGUAACACGUGGGCAACCUACC\n UUGUAGACUGGGAUAACUCCGGGAAACCGGGGCUAAUACCGGAUGAUCAUUUGGAUCGCAU\n GAUCCGAAUGUAAAAGUGGGGAUUUAUCCUCACACUGCAAGAUGGGCCCGCGGCGCA….. >A93610.15.1301\n CCACUGCUAUGGGGGUCCGACUAAGCCAUGCGAGUCAUGGGGUCCCUCUGGGACACCACC\n GGCGGACGGCUCAGUAACACGUCGGUAACCUACCCUCGGGAGGGGGAUAACCCCGGGAAA\n CUGGGGCUAAUCCCCCAUAGGCCUGAGGUACUGGAAGGUCCUCAGGCCGAAAGGGGCUU….

我需要创建一些东西来查看以 > 开头的行,然后转到第一个小数点后的数字(因此高于该数字将是 1 和 15)。从那个数字开始计数(在上面的例子中是 1 或 15),它需要提取从 69 开始到 497 的核苷酸(As、Cs、Gs 或 Us)(注意这个例子我取出了一堆的核苷酸)。

因此,对于我的尝试,我认为将核苷酸序列做成一个长串,然后尝试提取核苷酸是有意义的。但我似乎无法将 RNA 序列的行变成一个长字符串(见下文了解我的尝试)。一旦我有了大字符串,我就不确定如何提取正确的核苷酸。我需要写类似 s = [x:497] 的内容,其中 x 是 69-(在第一个小数点前插入该数字)。

 #!/usr/bin/env python
#Make a program that takes SSURef_NR99 file of sequences, makes a new file of
#Accession numbers and size of 16S.
import re
infilename = 'SSUtestdata.txt'
outfilename = 'SSUtestdata3.txt'

#Here I'm trying to search for one of the nucleotides, an end of line character and another nucleotide, trying to make a long string.

replace = re.compile(r'([A|C|G|U])(\n)([A|C|G|U])')

#remove extra letters and spaces
with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
for line in infile:
line = replace.sub(r'\1\3', line)

#Write to OutFile
outfile.write(line)

感谢您的任何想法!

最佳答案

如果我正确理解你的问题,应该这样做:

with open('path/to/input') as infile:
while 1:
try:
line = infile.readline()
_, start, end = line.strip().split('.')
start, end = int(start), int(end)
beg = infile.read(start-1)
infile.read(beg.count('\n'))
seq = infile.read(end-start)
extra = infile.read(seq.count('\n'))
seq = seq.replace('\n') + extra
print seq # print(seq) in python3
except:
break

关于python - 组合字符串,提取子串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20037983/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com