python - 如何在 biopython entrez.esearch 中下载完整的基因组序列-6ren

python - 如何在 biopython entrez.esearch 中下载完整的基因组序列

转载作者：行者123 更新时间：2023-11-28 17:45:45

25

4

我必须从 NCBI(GenBank(完整)格式)下载完整的基因组序列。我感兴趣的是“完整基因组”而不是“全基因组”。

我的脚本:

from Bio import Entrez
Entrez.email = "asiakXX@wp.pl"
gatunek='Escherichia[ORGN]'
handle = Entrez.esearch(db='nucleotide',
     term=gatunek, property='complete genome' )#title='complete genome[title]')
result = Entrez.read(handle)

结果我只得到基因组的小片段，大小约为 484 bp:

LOCUS       NZ_KE350773              484 bp    DNA     linear   CON 23-AUG-2013
DEFINITION  Escherichia coli E1777 genomic scaffold scaffold9_G, whole genome
       shotgun sequence.

我知道如何通过 NCBI 网站手动完成，但它非常耗时，我在那里使用的查询:

escherichia[orgn] AND complete genome[title]

结果我得到了多个基因组，大小范围约为 5,154,862 bp，这是我需要通过 ENTREZ.esearch 完成的。

最佳答案

你的问题很明确，但完整的答案很长。我提供的代码为每个所需的大肠杆菌基因组序列生成一个 .fasta 文件，是的，只有 NCBI 中的“Complete Genomes”。

您会看到 NCBI 中只有六个完整的大肠杆菌引用基因组 (http://www.ncbi.nlm.nih.gov/genome/167):

enter image description here

为了帮助您，这里是指向他们基因组的 Genbank/Refseq 链接:

这是我将完整基因组序列解析为 .FASTA 文件的代码...

# Imports
from Bio import Entrez
from Bio import SeqIO

#############################
# Retrieve NCBI Data Online #
#############################

Entrez.email     = "asiak@wp.pl"             # Always tell NCBI who you are
genomeAccessions = ['NC_000913', 'NC_002695', 'NC_011750', 'NC_011751', 'NC_017634', 'NC_018658']
search           = " ".join(genomeAccessions)
handle           = Entrez.read(Entrez.esearch(db="nucleotide", term=search, retmode="xml"))
genomeIds        = handle['IdList']
records          = Entrez.efetch(db="nucleotide", id=genomeIds, rettype="gb", retmode="text")

###############################
# Generate Genome Fasta files #
###############################

sequences   = []  # store your sequences in a list
headers     = []  # store genome names in a list (db_xref ids)

for i,record in enumerate(records):

    file_out = open("genBankRecord_"+str(i)+".gb", "w")    # store each genomes .gb in separate files
    file_out.write(record.read())
    file_out.close()

    genomeGenbank   = SeqIO.read("genBankRecord"+str(i)+".gb", "genbank")  # parse in the genbank files
    header         = genome.features[0].qualifiers['db_xref'][0]          # name the genome using db_xfred ID
    sequence       = genome.seq.tostring()                                # obtain genome sequence

    headers.append('>'+header)  # store genome name in list                                     
    sequences.append(sequence)  # store sequence in list

    fasta_out = open("genome"+str(i)+".fasta","w")     # store each genomes .fasta in separate files
    fasta_out.write(header)    # >header ... followed by:
    fasta_out.write(sequence)  # sequence ... 
    fasta_out.close()          # close that .fasta file and move on to next genome
records.close()

让我知道进展如何!安迪

关于python - 如何在 biopython entrez.esearch 中下载完整的基因组序列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18461629/

25

4

0

文章推荐： javascript - 单击表行时使用 React 门户显示模式组件(？)

文章推荐： javascript - 如何编写动态html标签

文章推荐： javascript - 如何让自动按钮调用 jQuery？

python - 将语言过滤器应用于 Entrez.esearch 和 Entrez.efetch
我正在使用 Biopython 查询 PubMed 的一些结果。这是代码的一部分: def search(query): Entrez.email = 'gandalf@rivende
r - 如何将大列表中的 Entrez id 转换为基因符号并替换 R 中列表中的 entrez id？
我有一个包含 300 个名字的大列表 data。例如: dput(data$name1) c("55024", "29126", "3732", "1960", "79368", "115352",
python - BioPython:使用 Entrez.esummary/Entrez.read 跳过错误的 GID
抱歉，奇怪的标题。我正在使用 eSearch 和 eSummary 从登录号 --> gID --> TaxID 假设“accessions”是一个包含 20 个登录号的列表(我一次做 20 个，
python - Entrez eFetch 入藏号
我们目前正在开展一个项目，需要从 ClinVar 访问“NP_”登录号。然而，当我们使用 Entrez.eFetch( ) 函数时，结果中似乎缺少此信息。以下是列出 NP_ 编号的网站页面的链接: h
python - biopython - Entrez.esearch() 查询翻译与我的查询不符
我是 Biopython 新手。使用此代码: handle = Entrez.esearch(db="nuccore", term="complete", field="title", FILT="r
python - 无法让 entrez 使用 biopython 返回网格项
快速提问 -- 第一次使用 biopython，我只是想根据教程快速尝试一些真正快速的东西。我似乎无法让 Entrez.efetch() 返回给定文章的网格术语，唯一的方法似乎是我正在做的，即: h
python - 如何导航 Biopython Entrez efetch 的结果？
当我运行以下命令时； from Bio.Blast import NCBIWWW from Bio import Entrez, SeqIO Entrez.email = "A.N.Other@exa
python - 如何在 biopython entrez.esearch 中下载完整的基因组序列
我必须从 NCBI(GenBank(完整)格式)下载完整的基因组序列。我感兴趣的是“完整基因组”而不是“全基因组”。我的脚本: from Bio import Entrez Entrez.email
python - 使用 Entrez 解析来自 PubMed 的出版数据的问题
我正在尝试使用 Entrez 将发布数据导入数据库。搜索部分工作正常，但是当我尝试解析时: from Bio import Entrez def create_publication(pmid):
python - 如何从 efetch(Biopython、Entrez)中提取摘要？
我是 python 的新手，想使用 bio 包中的 entrez 系统从 pubmed 中提取摘要。我通过电子搜索获得了我的 UID(存储在 my_list_ges 中)，我还可以使用 efetch
python - 如何使用 Bio.Entrez 提取完整的 PMC 文章标题和摘要列表？
我正在尝试从 PMC/Pubmed 下载完整的标题/摘要数据。这是一个古老的问题，但 stackoverflow 上的答案似乎都没有答案。一般方法是使用 Entrez 包，但话又说回来，您需要指定搜
python - 当 Entrez 增加 retmax 时回溯 KeyError
我正在尝试使用 Biopython entrez 收集已发布文章的列表。我想从 medline 格式收集文章的某些部分。如果没有设置 retmax，我下面编写的代码将起作用。默认为 20 篇文章，但是
python - 使用 Biopython Entrez 从 fasta 记录访问序列元素
我有一个 refseq ID 列表 (keys_list)，我用它来使用 BioPython Entrez 下拉序列记录。我只想访问返回的 fasta 记录中的序列，但我不想将记录写入文件才能执行此操
python - Entrez epost + elink 使用 Biopython 返回乱序结果
我今天遇到了这个，想把它扔掉。看来使用 NCBI 的 Entrez Biopython 接口(interface)，不可能以正确(与输入相同)的顺序返回结果(至少从 elink )。请参阅下面的代码示
python - XLRD/Entrez : Search through Pubmed and extract the counts
我正在开发一个项目，该项目要求我使用 Excel 电子表格中的输入来搜索 pubmed 并打印结果计数。我一直在使用 xlrd 和 entrez 来完成这项工作。这是我尝试过的。我需要使用作者姓名、
biopython - Entrez.esummary ('gene' db) : how to retrieve uid from DictElement?
我正在尝试从 NCBI Entrez Gene 数据库中检索和保存基因摘要，并且也想保留 uid，但是，尽管它在那里，但我找不到从结果中检索它的正确方法。见下文(注意:显然不是我在这里使用的有效电子邮
java - 使用 Entrez Utilities Web 服务访问 pubmed 摘要
我使用Entrez Utilities Web Service library从我的 Java 应用程序访问 pubmed 文章(使用 pubmed 标识符)。如何使用此工具的完整工作示例由 Rena
python - 在 BioPython 中使用 Entrez 从 GenBank 检索和解析蛋白质序列
很快就会很明显，我是 Python 和一般编码的新手。我有一个存储为文本文件的基因 ID 列表，我想使用 Entrez 函数搜索 GenBank 数据库并检索与 ID 对应的蛋白质序列。理想情况下，我
python - 使用 entrez 和 biopython 在 medline 数据库中搜索标题
我正在尝试搜索标题中包含特定单词的论文。更准确地说，是 2010 年至 2015 年间发表的论文中的病毒或病毒一词。这是我的代码: import re from Bio import Medline
python - Biopython 1.60 中的 Bio.Entrez 和蛋白质问题
我在使用 Bio.Entrez 搜索蛋白质时遇到问题。我正在这样做: >>> handle=Entrez.esearch(db="protein", term="insulin AND homo")

首页

博学

6Ren·AI

商城

python - 如何在 biopython entrez.esearch 中下载完整的基因组序列