gpt4 book ai didi

python - DNA 序列操作

转载 作者:太空宇宙 更新时间:2023-11-03 16:40:23 37 4
gpt4 key购买 nike

所以我对编程非常陌生,并且对任何编程语言都不是很了解。我买了一本关于生物学家编程的书,我摸索出了一些东西。我想:从文件中获取序列并从中查找并提取可变区域。我的代码如下:

**

#!/usr/bin/python
#for extracting GAA sequences
import os
import sys
import re
#opens sequence file and defines it as reps
reps = open('142sequences.txt')
#defining what to read
line = reps.readlines()
#defines what we are looking for in rep lines
for line in reps:
sear = re.search(r"C[A]{2,}G[ATCG]{17, 2700}AAT[A]{2,4}G[A]{2,}", reps)
if sear:
repeats = sear.group()
print(repeats)
else:
print('Not Recognized')
**我没有得到任何返回。请帮忙

最佳答案

您需要搜索每一行而不是代表,这是所有行的列表:

with open('142sequences.txt') as reps:
# iterate over each line in the file
for line in reps:
# pass each line to re.search
sear = re.search(r"C[A]{2,}G[ATCG]{17, 2700}AAT[A]{2,4}G[A]{2,}", line)
if sear:
repeats = sear.group()
print(repeats)
else:
print('Not Recognized')

调用 readlines 将所有行读取到列表中,因此您实际上永远不会在自己的代码中循环,因为您会在初始 readlines 调用中消耗迭代器,如果您循环它会导致错误,因为您必须传递字符串而不是列表进行搜索。

关于python - DNA 序列操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36848335/

37 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com