gpt4 book ai didi

python - 生物信息学 : Find Genes given a Genome String

转载 作者:行者123 更新时间:2023-11-28 16:27:10 30 4
gpt4 key购买 nike

生物学家使用字母 A、C、T 和 G 的序列来模拟基因组。基因是基因组的子序列,起始于三联体 ATG 之后,结束于三联体 TAG、TAA 或 TGA 之前。此外,基因串的长度是3的倍数,并且该基因不包含任何三联体ATG、TAG、TAA和TGA。

理想情况下:

Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT #Enter   
TTT
GGGCGT
-----------------
Enter a genome string: TGTGTGTATAT
No Genes Were Found

到目前为止,我有:

def findGene(gene):
final = ""
genep = gene.split("ATG")
for part in genep:
for chr in part:
for i in range(0, len(chr)):
if genePool(chr[i:i + 3]) == 1:
break
else:
final += (chr[i+i + 3] + "\n")
return final

def genePool(part):
g1 = "ATG"
g2 = "TAG"
g3 = "TAA"
g4 = "TGA"
if (part.count(g1) != 0) or (part.count(g2) != 0) or (part.count(g3) != 0) or (part.count(g4) != 0):
return 1

def main():
geneinput = input("Enter a genome string: ")
print(findGene(geneinput))

main()
# TTATGTTTTAAGGATGGGGCGTTAGTT

我总是遇到错误

老实说,这对我来说真的行不通 - 我认为我已经用这些代码行进入了死胡同 - 一种新方法可能会有所帮助。

提前致谢!

我得到的错误-

Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT
Traceback (most recent call last):
File "D:\Python\Chapter 8\Bioinformatics.py", line 40, in <module>
main()
File "D:\Python\Chapter 8\Bioinformatics.py", line 38, in main
print(findGene(geneinput))
File "D:\Python\Chapter 8\Bioinformatics.py", line 25, in findGene
final += (chr[i+i + 3] + "\n")
IndexError: string index out of range

就像我之前说的,我不确定我是否在正确的轨道上用我当前的代码解决问题 - 任何带有伪代码的新想法都会受到赞赏!

最佳答案

这可以通过 regular expression 来完成:

import re

pattern = re.compile(r'ATG((?:[ACTG]{3})+?)(?:TAG|TAA|TGA)')
pattern.findall('TTATGTTTTAAGGATGGGGCGTTAGTT')
pattern.findall('TGTGTGTATAT')

输出

['TTT', 'GGGCGT'][]

Explanation extracted from https://regex101.com/r/yI4tN9/3

"ATG((?:[ACTG]{3})+?)(?:TAG|TAA|TGA)"g
ATG matches the characters ATG literally (case sensitive)
1st Capturing group ((?:[ACTG]{3})+?)
(?:[ACTG]{3})+? Non-capturing group
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
[ACTG]{3} match a single character present in the list below
Quantifier: {3} Exactly 3 times
ACTG a single character in the list ACTG literally (case sensitive)
(?:TAG|TAA|TGA) Non-capturing group
1st Alternative: TAG
TAG matches the characters TAG literally (case sensitive)
2nd Alternative: TAA
TAA matches the characters TAA literally (case sensitive)
3rd Alternative: TGA
TGA matches the characters TGA literally (case sensitive)
g modifier: global. All matches (don't return on first match)

关于python - 生物信息学 : Find Genes given a Genome String,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35616698/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com