python - 如何从 efetch(Biopython、Entrez)中提取摘要？-6ren

python - 如何从 efetch(Biopython、Entrez)中提取摘要？

转载作者：太空狗更新时间：2023-10-30 01:06:37

我是 python 的新手，想使用 bio 包中的 entrez 系统从 pubmed 中提取摘要。我通过电子搜索获得了我的 UID(存储在 my_list_ges 中)，我还可以使用 efetch 下载条目。然而，现在结果是一个字典列表，条目看起来像字典，但我无法访问它们:

Entrez.email= "my-email@provider.sth"
handle=Entrez.efetch(db="pubmed",id=my_list_ges[0],rettype="null",retmode="xml")
record = Entrez.read(handle)
abstract=record["Abstract"]
handle.close()

结果是类型错误:

TypeError: list indices must be integers, not str

当我尝试从第一条记录中检索 'Abstract' 时，我得到了一个 KeyError:

>>> record[0]["Abstract"]
KeyError: 'Abstract'

这很奇怪，因为根据电子搜索的结果，我可以通过字典轻松访问我的 UID

记录[0]的结构是:

{u'MedlineCitation': DictElement({
        u'OtherID': [],
        u'OtherAbstract': [],
        u'CitationSubset': ['IM'],
        u'KeywordList': [],
        u'DateCreated': {u'Month': '03', u'Day': '17', u'Year': '2016'},
        u'SpaceFlightMission': [],
        u'GeneralNote': [],
        u'Article':
        DictElement({
            u'ArticleDate': [
                DictElement({u'Month': '03', u'Day': '16', u'Year': '2016'}, attributes={u'DateType': u'Electronic'})],
           u'Pagination': {u'MedlinePgn': 'e0151666'},
           u'AuthorList': ListElement([
            DictElement({
                u'LastName': "O'Neill",
                u'Initials': 'KE',
                u'Identifier': [],
                u'AffiliationInfo': [{
                    u'Affiliation': 'MRC Centre for Regenerative Medicine, Institute for Stem Cell Research, School of Biological Sciences, University of Edinburgh, SCRM Building, 5 Little France Drive, Edinburgh, EH16 4UU, UK.',
                    u'Identifier': []}], 
                u'ForeName': 'Kathy E'
                }, attributes={u'ValidYN': u'Y'}),
            DictElement({
                u'LastName': 'Bredenkamp',
                u'Initials': 'N', u'Identifier': [],
                u'AffiliationInfo': [{
                    u'Affiliation': 'MRC Centre for Regenerative Medicine, Institute for Stem Cell Research, School of Biological Sciences, University of Edinburgh, SCRM Building, 5 Little France Drive, Edinburgh, EH16 4UU, UK.',
                    u'Identifier': []}],
                u'ForeName': 'Nicholas'}, attributes={u'ValidYN': u'Y'}), 
            DictElement({
                u'LastName': 'Tischner',
                u'Initials': 'C',
                u'Identifier': [], 
                u'AffiliationInfo': [{
                    u'Affiliation': 'MRC Centre for Regenerative Medicine, Institute for Stem Cell Research, School of Biological Sciences, University of Edinburgh, SCRM Building, 5 Little France Drive, Edinburgh, EH16 4UU, UK.',
                    u'Identifier': []}],
                u'ForeName': 'Christin'}, attributes={u'ValidYN': u'Y'}),
            DictElement({
                u'LastName': 'Vaidya',
                u'Initials': 'HJ',
                u'Identifier': [],
                u'AffiliationInfo': [{
                    u'Affiliation': 'MRC Centre for Regenerative Medicine, Institute for Stem Cell Research, School of Biological Sciences, University of Edinburgh, SCRM Building, 5 Little France Drive, Edinburgh, EH16 4UU, UK.',
                    u'Identifier': []}],
                u'ForeName': 'Harsh J'}, attributes={u'ValidYN': u'Y'}),
            DictElement({
                u'LastName': 'Stenhouse',
                u'Initials': 'FH',
                u'Identifier': [],
                u'AffiliationInfo': [{
                    u'Affiliation': 'MRC Centre for Regenerative Medicine, Institute for Stem Cell Research, School of Biological Sciences, University of Edinburgh, SCRM Building, 5 Little France Drive, Edinburgh, EH16 4UU, UK.',
                    u'Identifier': []}], u'ForeName': 'Frances H'}, attributes={u'ValidYN': u'Y'}),
            DictElement({
                u'LastName': 'Peddie',
                u'Initials': 'CD',
                u'Identifier': [],
                u'AffiliationInfo': [{
                    u'Affiliation': 'MRC Centre for Regenerative Medicine, Institute for Stem Cell Research, School of Biological Sciences, University of Edinburgh, SCRM Building, 5 Little France Drive, Edinburgh, EH16 4UU, UK.',
                    u'Identifier': []}],
                u'ForeName': 'C Diana'}, attributes={u'ValidYN': u'Y'}),
            DictElement({
                u'LastName': 'Nowell',
                u'Initials': 'CS',
                u'Identifier': [],
                u'AffiliationInfo': [{
                    u'Affiliation': 'MRC Centre for Regenerative Medicine, Institute for Stem Cell Research, School of Biological Sciences, University of Edinburgh, SCRM Building, 5 Little France Drive, Edinburgh, EH16 4UU, UK.',
                    u'Identifier': []}],
                u'ForeName': 'Craig S'}, attributes={u'ValidYN': u'Y'}),
            DictElement({
                u'LastName': 'Gaskell', 
                u'Initials': 'T', 
                u'Identifier': [], 
                u'AffiliationInfo': [{
                    u'Affiliation': 'MRC Centre for Regenerative Medicine, Institute for Stem Cell Research, School of Biological Sciences, University of Edinburgh, SCRM Building, 5 Little France Drive, Edinburgh, EH16 4UU, UK.',
                    u'Identifier': []}],
                u'ForeName': 'Terri'}, attributes={u'ValidYN': u'Y'}),
            DictElement({
                u'LastName': 'Blackburn',
                u'Initials': 'CC',
                u'Identifier': [],
                u'AffiliationInfo': [{
                    u'Affiliation': 'MRC Centre for Regenerative Medicine, Institute for Stem Cell Research, School of Biological Sciences, University of Edinburgh, SCRM Building, 5 Little France Drive, Edinburgh, EH16 4UU, UK.',
                    u'Identifier': []}], u'ForeName': 'C Clare'}, attributes={u'ValidYN': u'Y'})],
            attributes={u'Type': u'authors', u'CompleteYN': u'Y'}),
        u'Language': ['eng'],
        u'PublicationTypeList': [StringElement('Journal Article', attributes={u'UI': u'D016428'})],
        u'Journal': {
            u'ISSN': StringElement('1932-6203', attributes={u'IssnType': u'Electronic'}),
            u'ISOAbbreviation': 'PLoS ONE',
            u'JournalIssue': DictElement({
                u'Volume': '11',
                u'Issue': '3',
                u'PubDate': {u'Year': '2016'}}, attributes={u'CitedMedium': u'Internet'}),
            u'Title': 'PloS one'},
        u'ArticleTitle': 'Foxn1 Is Dynamically Regulated in Thymic Epithelial Cells during Embryogenesis and at the Onset of Thymic Involution.',
        u'ELocationID': [StringElement('10.1371/journal.pone.0151666', attributes={u'ValidYN': u'Y', u'EIdType': u'doi'})],
        u'Abstract': {u'AbstractText': ['--Unnecessarily long abstract removed --']}}, attributes={u'PubModel': u'Electronic-eCollection'}),
        u'PMID': StringElement('26983083', attributes={u'Version': u'1'}),
        u'MedlineJournalInfo': {
            u'MedlineTA': 'PLoS One',
            u'Country': 'United States',
            u'NlmUniqueID': '101285081',
            u'ISSNLinking': '1932-6203'}}, attributes={u'Owner': u'NLM', u'Status': u'In-Data-Review'}),
 u'PubmedData': {
    u'ArticleIdList': [
        StringElement('10.1371/journal.pone.0151666', attributes={u'IdType': u'doi'}),
        StringElement('PONE-D-15-47173', attributes={u'IdType': u'pii'}),
        StringElement('26983083', attributes={u'IdType': u'pubmed'})],
    u'PublicationStatus': 'epublish',
    u'History': [
        DictElement({u'Month': '', u'Day': '', u'Year': '2016'}, attributes={u'PubStatus': u'ecollection'}),
        DictElement({u'Month': '10', u'Day': '28', u'Year': '2015'}, attributes={u'PubStatus': u'received'}),
        DictElement({u'Month': '3', u'Day': '2', u'Year': '2016'}, attributes={u'PubStatus': u'accepted'}),
        DictElement({u'Month': '3', u'Day': '16', u'Year': '2016'}, attributes={u'PubStatus': u'epublish'}),
        DictElement({u'Minute': '0', u'Month': '3', u'Day': '17', u'Hour': '6', u'Year': '2016'}, attributes={u'PubStatus': u'entrez'}),
        DictElement({u'Minute': '0', u'Month': '3', u'Day': '18', u'Hour': '6', u'Year': '2016'}, attributes={u'PubStatus': u'pubmed'}),
        DictElement({u'Minute': '0', u'Month': '3', u'Day': '18', u'Hour': '6', u'Year': '2016'}, attributes={u'PubStatus': u'medline'})]}
}

最佳答案

我发现返回 Medline 记录并对其进行解析要容易得多。我为相关查询插入了完整的工作代码:query = "Tischner[AU] Cortex-specific down-regulation"。 下面代码的关键点是fetch_rec()函数使用rettype='Medline', retmode='text'然后使用BioPython的解析结果记录Medline 模块。

from StringIO import StringIO
from Bio import Entrez, Medline

def search_medline(query, email):
    Entrez.email = email
    search = Entrez.esearch(db='pubmed', term=query, usehistory='y')
    handle = Entrez.read(search)
    try:
        return handle
    except Exception as e:
        raise IOError(str(e))
    finally:
        search.close()

def fetch_rec(rec_id, entrez_handle):
    fetch_handle = Entrez.efetch(db='pubmed', id=rec_id,
                                 rettype='Medline', retmode='text',
                                 webenv=entrez_handle['WebEnv'],
                                 query_key=entrez_handle['QueryKey'])
    rec = fetch_handle.read()
    return rec

def main(query, email):
    rec_handler = search_medline(query, email)

    for rec_id in rec_handler['IdList']:
        rec = fetch_rec(rec_id, rec_handler)
        rec_file = StringIO(rec)
        medline_rec = Medline.read(rec_file)
        if 'AB' in medline_rec:
            print(medline_rec['AB'])

if __name__ == '__main__':
    email = "my-email@provider.sth"
    query = "Tischner[AU] Cortex-specific down-regulation"
    main(query, email)

它会打印出您要查找的摘要，但是通过更改 query 参数，该脚本可以适用于任何搜索。有更有效的方法来提取大量记录，但对于小型搜索来说，这就足够了。

关于python - 如何从 efetch(Biopython、Entrez)中提取摘要？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36087715/

文章推荐： c# - 处理 WPF TextBox LostFocus 事件和绑定(bind)的顺序

文章推荐： c# - 使用 C# 将默认值插入 SQL Server

文章推荐： python - 使用 Swagger 生成的新代码更新 Flask 代码

文章推荐： c# - 创建一个没有设计器的 winform 用户控件

python - 将语言过滤器应用于 Entrez.esearch 和 Entrez.efetch
我正在使用 Biopython 查询 PubMed 的一些结果。这是代码的一部分: def search(query): Entrez.email = 'gandalf@rivende
r - 如何将大列表中的 Entrez id 转换为基因符号并替换 R 中列表中的 entrez id？
我有一个包含 300 个名字的大列表 data。例如: dput(data$name1) c("55024", "29126", "3732", "1960", "79368", "115352",
python - BioPython:使用 Entrez.esummary/Entrez.read 跳过错误的 GID
抱歉，奇怪的标题。我正在使用 eSearch 和 eSummary 从登录号 --> gID --> TaxID 假设“accessions”是一个包含 20 个登录号的列表(我一次做 20 个，
python - Entrez eFetch 入藏号
我们目前正在开展一个项目，需要从 ClinVar 访问“NP_”登录号。然而，当我们使用 Entrez.eFetch( ) 函数时，结果中似乎缺少此信息。以下是列出 NP_ 编号的网站页面的链接: h
python - biopython - Entrez.esearch() 查询翻译与我的查询不符
我是 Biopython 新手。使用此代码: handle = Entrez.esearch(db="nuccore", term="complete", field="title", FILT="r
python - 无法让 entrez 使用 biopython 返回网格项
快速提问 -- 第一次使用 biopython，我只是想根据教程快速尝试一些真正快速的东西。我似乎无法让 Entrez.efetch() 返回给定文章的网格术语，唯一的方法似乎是我正在做的，即: h
python - 如何导航 Biopython Entrez efetch 的结果？
当我运行以下命令时； from Bio.Blast import NCBIWWW from Bio import Entrez, SeqIO Entrez.email = "A.N.Other@exa
python - 如何在 biopython entrez.esearch 中下载完整的基因组序列
我必须从 NCBI(GenBank(完整)格式)下载完整的基因组序列。我感兴趣的是“完整基因组”而不是“全基因组”。我的脚本: from Bio import Entrez Entrez.email
python - 使用 Entrez 解析来自 PubMed 的出版数据的问题
我正在尝试使用 Entrez 将发布数据导入数据库。搜索部分工作正常，但是当我尝试解析时: from Bio import Entrez def create_publication(pmid):
python - 如何从 efetch(Biopython、Entrez)中提取摘要？
我是 python 的新手，想使用 bio 包中的 entrez 系统从 pubmed 中提取摘要。我通过电子搜索获得了我的 UID(存储在 my_list_ges 中)，我还可以使用 efetch
python - 如何使用 Bio.Entrez 提取完整的 PMC 文章标题和摘要列表？
我正在尝试从 PMC/Pubmed 下载完整的标题/摘要数据。这是一个古老的问题，但 stackoverflow 上的答案似乎都没有答案。一般方法是使用 Entrez 包，但话又说回来，您需要指定搜
python - 当 Entrez 增加 retmax 时回溯 KeyError
我正在尝试使用 Biopython entrez 收集已发布文章的列表。我想从 medline 格式收集文章的某些部分。如果没有设置 retmax，我下面编写的代码将起作用。默认为 20 篇文章，但是
python - 使用 Biopython Entrez 从 fasta 记录访问序列元素
我有一个 refseq ID 列表 (keys_list)，我用它来使用 BioPython Entrez 下拉序列记录。我只想访问返回的 fasta 记录中的序列，但我不想将记录写入文件才能执行此操
python - Entrez epost + elink 使用 Biopython 返回乱序结果
我今天遇到了这个，想把它扔掉。看来使用 NCBI 的 Entrez Biopython 接口(interface)，不可能以正确(与输入相同)的顺序返回结果(至少从 elink )。请参阅下面的代码示
python - XLRD/Entrez : Search through Pubmed and extract the counts
我正在开发一个项目，该项目要求我使用 Excel 电子表格中的输入来搜索 pubmed 并打印结果计数。我一直在使用 xlrd 和 entrez 来完成这项工作。这是我尝试过的。我需要使用作者姓名、
biopython - Entrez.esummary ('gene' db) : how to retrieve uid from DictElement?
我正在尝试从 NCBI Entrez Gene 数据库中检索和保存基因摘要，并且也想保留 uid，但是，尽管它在那里，但我找不到从结果中检索它的正确方法。见下文(注意:显然不是我在这里使用的有效电子邮
java - 使用 Entrez Utilities Web 服务访问 pubmed 摘要
我使用Entrez Utilities Web Service library从我的 Java 应用程序访问 pubmed 文章(使用 pubmed 标识符)。如何使用此工具的完整工作示例由 Rena
python - 在 BioPython 中使用 Entrez 从 GenBank 检索和解析蛋白质序列
很快就会很明显，我是 Python 和一般编码的新手。我有一个存储为文本文件的基因 ID 列表，我想使用 Entrez 函数搜索 GenBank 数据库并检索与 ID 对应的蛋白质序列。理想情况下，我
python - 使用 entrez 和 biopython 在 medline 数据库中搜索标题
我正在尝试搜索标题中包含特定单词的论文。更准确地说，是 2010 年至 2015 年间发表的论文中的病毒或病毒一词。这是我的代码: import re from Bio import Medline
python - Biopython 1.60 中的 Bio.Entrez 和蛋白质问题
我在使用 Bio.Entrez 搜索蛋白质时遇到问题。我正在这样做: >>> handle=Entrez.esearch(db="protein", term="insulin AND homo")

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何从 efetch(Biopython、Entrez)中提取摘要？