gpt4 book ai didi

python - 修改 genbank 要素的位置

转载 作者:太空宇宙 更新时间:2023-11-03 18:18:18 25 4
gpt4 key购买 nike

编辑:我知道feature.type将给出gene/CDS,feature.qualifiers将给出“db_xref”/“locus_tag”/“inference”等。是否有一个 feature. 对象允许我访问该位置(例如:[5240:7267](+) )直接?

此 URL 提供了更多信息,但我不知道如何将其用于我的目的... http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html#location_operator

原始帖子:

我正在尝试修改 GenBank 文件中功能的位置。本质上,我想修改 GenBank 文件的以下部分:

 gene            5240..7267
/db_xref="GeneID:887081"
/locus_tag="Rv0005"
/gene="gyrB"
CDS 5240..7267
/locus_tag="Rv0005"
/inference="protein motif:PROSITE:PS00177"
...........................

 gene            5357..7267
/db_xref="GeneID:887081"
/locus_tag="Rv0005"
/gene="gyrB"
CDS 5357..7267
/locus_tag="Rv0005"
/inference="protein motif:PROSITE:PS00177"
.............................

请注意从 52405357 的变化

到目前为止,通过搜索互联网和 Stackoverflow,我得到了:

from Bio import SeqIO
gb_file = "mtbtomod.gb"
gb_record = SeqIO.parse(open(gb_file, "r+"), "genbank")
rvnumber = 'Rv0005'
newstart = 5357

final_features = []

for record in gb_record:
for feature in record.features:
if feature.type == "gene":
if feature.qualifiers["locus_tag"][0] == rvnumber:
if feature.location.strand == 1:
feature.qualifiers["amend_position"] = "%s:%s" % (newstart, feature.location.end+1)
else:
# do the reverse for the complementary strand
final_features.append(feature)
record.features = final_features
with open("testest.gb","w") as testest:
SeqIO.write(record, testest, "genbank")

这基本上创建了一个名为“amend_position”的新限定符。但是,我想做的是直接修改位置(无论是否创建新文件...)

Rv0005 只是我需要更新的 locus_tag 的一个示例。我还有大约 600 个位置需要更新,这说明需要脚本。帮助!

最佳答案

好的,我的东西现在完全可以工作了。我会发布代码,以防有人需要类似的东西

__author__ = 'Kavin'

from Bio import SeqIO
from Bio import SeqFeature
import xlrd
import sys
import re

workbook = xlrd.open_workbook(sys.argv[2])
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]

# Create dicts to store TSS data
TSS = {}
row = {}
# For each entry (row), store the startcodon and strand information
for i in range(2, sheet.nrows - 1):
if data[i][5] < -0.7: # Ensures BASS score is within significant range
Gene = data[i][0]
row['Direction'] = str(data[i][3])
row['StartCodon'] = int(data[i][4])
TSS[str(Gene)] = row
row = {}
else:
i += 1

# Create an output filename based on input filename
outfile_init = re.search('(.*)\.(\w*)', sys.argv[1])
outfile = str(outfile_init.group(1)) + '_modified.' + str(outfile_init.group(2))

final_features = []
for record in SeqIO.parse(open(sys.argv[1], "r"), "genbank"):
for feature in record.features:
if feature.type == "gene" or feature.type == "CDS":
if TSS.has_key(feature.qualifiers["locus_tag"][0]):
newstart = TSS[feature.qualifiers["locus_tag"][0]]['StartCodon']
if feature.location.strand == 1:
feature.location = SeqFeature.FeatureLocation(SeqFeature.ExactPosition(newstart - 1),
SeqFeature.ExactPosition(
feature.location.end.position),
feature.location.strand)
else:
feature.location = SeqFeature.FeatureLocation(
SeqFeature.ExactPosition(feature.location.start.position),
SeqFeature.ExactPosition(newstart), feature.location.strand)
final_features.append(feature) # Append final features
record.features = final_features
with open(outfile, "w") as new_gb:
SeqIO.write(record, new_gb, "genbank")

这假设使用诸如 python program.py <genbankfile> <excel spreadsheet>

这还假设电子表格具有以下格式:

基因同义词Tuberculist_annotated_start方向重新annotated_start BASS_score

Rv0005 gyrB 5240 + 5357 -1.782

Rv0012 Rv0012 14089 + 14134 -1.553

Rv0018c pstP 23181 - 23172 -2.077

Rv0032 BioF2 34295 + 34307 -0.842

Rv0037c Rv0037c 41202 - 41163 -0.554

关于python - 修改 genbank 要素的位置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24636588/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com