python - 从 pubmed 获取摘要-6ren

python - 从 pubmed 获取摘要

转载作者：行者123 更新时间：2023-11-30 22:30:40

26

4

我在从以下查询获取摘要时遇到问题

Entrez.email = "anonymous@gmail.com"
esearch_query = Entrez.esearch(db="pubmed", term="cancer AND food", retmode="xml")
esearch_result = Entrez.read(esearch_query)

# Now we need to get all papers from our search using the IDList
for iden in esearch_result['IdList'][-1]:
    pubmed_entry = Entrez.efetch(db="pubmed", id=iden, retmode="xml")
    result = Entrez.read(pubmed_entry)
    print result

输出如下(仅以其中一个条目为例)。

{u'PubmedArticle': [{u'MedlineCitation': DictElement({u'DateCompleted': {u'Month': '01', u'Day': '10', u'Year': '1976'}, u'OtherID': [], u'DateRevised': {u'Month': '03', u'Day': '22', u'Year': '2017'}, u'MeshHeadingList': [{u'QualifierName': [], u'DescriptorName': StringElement('Binding Sites', attributes={u'UI': u'D001665', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'N'})], u'DescriptorName': StringElement('Cobalt', attributes={u'UI': u'D003035', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Hemoglobins', attributes={u'UI': u'D006454', u'MajorTopicYN': u'Y'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Humans', attributes={u'UI': u'D006801', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Hydrogen-Ion Concentration', attributes={u'UI': u'D006863', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'N'})], u'DescriptorName': StringElement('Iron', attributes={u'UI': u'D007501', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Ligands', attributes={u'UI': u'D008024', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Mathematics', attributes={u'UI': u'D008433', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'Y'})], u'DescriptorName': StringElement('Oxygen', attributes={u'UI': u'D010100', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Oxyhemoglobins', attributes={u'UI': u'D010108', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Protein Binding', attributes={u'UI': u'D011485', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Spectrum Analysis', attributes={u'UI': u'D013057', u'MajorTopicYN': u'N'})}], u'OtherAbstract': [], u'CitationSubset': ['IM'], u'ChemicalList': [{u'NameOfSubstance': StringElement('Hemoglobins', attributes={u'UI': u'D006454'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Ligands', attributes={u'UI': u'D008024'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Oxyhemoglobins', attributes={u'UI': u'D010108'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Cobalt', attributes={u'UI': u'D003035'}), u'RegistryNumber': '3G0H8C9362'}, {u'NameOfSubstance': StringElement('Iron', attributes={u'UI': u'D007501'}), u'RegistryNumber': 'E1UOL152H7'}, {u'NameOfSubstance': StringElement('Oxygen', attributes={u'UI': u'D010100'}), u'RegistryNumber': 'S88TT14065'}], u'KeywordList': [], u'DateCreated': {u'Month': '01', u'Day': '10', u'Year': '1976'}, u'SpaceFlightMission': [], u'GeneralNote': [], u'Article': DictElement({u'ArticleDate': [], u'Pagination': {u'MedlinePgn': '1424-31'}, u'AuthorList': ListElement([DictElement({u'LastName': 'Chow', u'Initials': 'YW', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'Y W'}, attributes={u'ValidYN': u'Y'}), DictElement({u'LastName': 'Pietranico', u'Initials': 'R', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'R'}, attributes={u'ValidYN': u'Y'}), DictElement({u'LastName': 'Mukerji', u'Initials': 'A', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'A'}, attributes={u'ValidYN': u'Y'})], attributes={u'CompleteYN': u'Y'}), u'Language': ['eng'], u'PublicationTypeList': [StringElement('Journal Article', attributes={u'UI': u'D016428'}), StringElement("Research Support, U.S. Gov't, Non-P.H.S.", attributes={u'UI': u'D013486'})], u'Journal': {u'ISSN': StringElement('0006-291X', attributes={u'IssnType': u'Print'}), u'ISOAbbreviation': 'Biochem. Biophys. Res. Commun.', u'JournalIssue': DictElement({u'Volume': '66', u'Issue': '4', u'PubDate': {u'Month': 'Oct', u'Day': '27', u'Year': '1975'}}, attributes={u'CitedMedium': u'Print'}), u'Title': 'Biochemical and biophysical research communications'}, u'ArticleTitle': 'Studies of oxygen binding energy to hemoglobin molecule.', u'ELocationID': []}, attributes={u'PubModel': u'Print'}), u'PMID': StringElement('6', attributes={u'Version': u'1'}), u'MedlineJournalInfo': {u'MedlineTA': 'Biochem Biophys Res Commun', u'Country': 'United States', u'NlmUniqueID': '0372516', u'ISSNLinking': '0006-291X'}}, attributes={u'Status': u'MEDLINE', u'Owner': u'NLM'}), u'PubmedData': {u'ArticleIdList': [StringElement('6', attributes={u'IdType': u'pubmed'}), StringElement('0006-291X(75)90518-5', attributes={u'IdType': u'pii'})], u'PublicationStatus': 'ppublish', u'History': [DictElement({u'Month': '10', u'Day': '27', u'Year': '1975'}, attributes={u'PubStatus': u'pubmed'}), DictElement({u'Minute': '1', u'Month': '10', u'Day': '27', u'Hour': '0', u'Year': '1975'}, attributes={u'PubStatus': u'medline'}), DictElement({u'Minute': '0', u'Month': '10', u'Day': '27', u'Hour': '0', u'Year': '1975'}, attributes={u'PubStatus': u'entrez'})]}}], u'PubmedBookArticle': []}

如何获取摘要？最终的想法是在sql数据库中拥有一些字段(例如title，abstract..)。

谢谢，大卫

最佳答案

可能对您不利的是，通常没有 1975 年之前的 MEDLINE PubMed 记录摘要 - 您的示例正好处于 1975 年的风口浪尖。我使用您的代码和不同的查询，显示了两个文章 ID ，一个有摘要，一个没有:

from Bio import Entrez

Entrez.email = "anonymous@gmail.com"

esearch_query = Entrez.esearch(db="pubmed", term="cancer AND wombats", retmode="xml")
esearch_result = Entrez.read(esearch_query)

for identifier in esearch_result['IdList']:
    pubmed_entry = Entrez.efetch(db="pubmed", id=identifier, retmode="xml")
    result = Entrez.read(pubmed_entry)

    article = result['PubmedArticle'][0]['MedlineCitation']['Article']

    if 'Abstract' in article:
        print(article['Abstract']['AbstractText'])

截断输出

['This report catalogues all spontaneous proliferations in macropods, koalas, wombats, and possums and gliders held by the Comparative Pathology Registry at Taronga Zoo. Proliferative lesions were present in 14 macropods, 26 koalas, two wombats and 22 possums and gliders. Most neoplasms recorded in macropods were singular and ....']

详情可见文档:MEDLINE PubMed XML Element Descriptions

关于python - 从 pubmed 获取摘要，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45979650/

26

4

0

文章推荐： python - 根据另一个数据框中的列名选择数据框中的行

文章推荐： php - 将 PHP 数组的值与大型表 MySQL 进行比较

文章推荐： c# - XNA 性能下降 GameState Management

文章推荐： MySQL:即使没有记录也选择一个范围内的所有日期

php - 解析 PubMed 查询的curl结果并将其格式化为引文
这是 this 的后续问题问题。同样的想法:我从 PubMed 中提取 XML 数据，并使用curl 来处理这些结果。这使我能够获取所需的信息(酒吧 ID 列表)并将其用作另一个 PubMed 抓取
pdf - PubMed 文章的全文 PDF
在开展项目时，我需要下载和处理 PubMed 摘要的全文文章，是否有任何实现的代码或工具允许用户输入一组 PubMed id 并下载相同的免费全文文章。非常感谢任何类型的帮助或提示。最佳答案由于
python - 从 pubmed 获取摘要
我在从以下查询获取摘要时遇到问题 Entrez.email = "anonymous@gmail.com" esearch_query = Entrez.esearch(db="pubmed", te
python - 如何解析 PubMed 文本文件？
我正在开展一个项目，我必须构建 SVM 分类器来根据文章标题和摘要中的单词来预测 MeSH 术语分配。我们获得了包含 1000 个 PMID 的 gzip 文件，用于标识每篇文章。下面是一个示例文件:
python - Pubmed eutils esearch 的排序选项？
我正在使用 BioPython 通过 eutils API 查询 Pubmed 数据库。 esearch端点有一个排序选项，但 API 文档没有列出这个值的所有选项。 http://www.ncbi.
biopython - 有什么方法可以获取给定的 pubmed id 列表的摘要吗？
我有 pmids 列表我想在单个网址点击中获取他们两个的摘要 pmids=[17284678,9997] abstract_dict={} url = https://euti
java - Pubmed 返回无效的 XML 结果吗？
我正在使用JEUtils在 Java 中获取并解析 Pubmed 结果(这是一个似乎已被放弃的工具)。从几天前开始，该工具在某些结果中抛出异常，经过检查，Pubmed 似乎没有尊重自己的 DTD (
python - BioPython Pubmed Eutils 网址？
我正在尝试针对 Pubmed 的 Eutils 服务运行一些查询。如果我在网站上运行它们，我会返回一定数量的记录，在本例中为 13126 ( link to pubmed )。不久前，我将一个 py
Java PubMed 阻止 url 请求
我有一些代码可以访问 PubMed 中的文章并解析每个 XML 中的一些信息。该程序在我的计算机上运行良好，但需要很多时间才能完成。因此，当我在 UNIX 机器上运行它(特别是对于此类事情)时，我发出
python - 如何使用 Python 根据日期和术语从 Pubmed 检索信息？
你能告诉我如何从PubMed 获取5 篇最新文章吗？包含“肥胖”一词并使用 Python 返回每篇论文的作者、标题、日期、doi 和 PubMed PMID？提前谢谢你编辑: 到目前为止我的尝试。我
r - 使用 rentrez 从 pubmed 解析作者和隶属关系
我的总体目标是构建一个共同作者网络图。我有一个 PubMed ID 的列表，这些是我唯一感兴趣的关于共同作者网络图表的出版物。我无法弄清楚如何使用 rentrez 在我的查询中同时获取作者姓名和各自的
r - 使用 R 从 Pubmed 数据中的隶属关系中提取大学名称
我一直在使用 R 中非常有用的 rentrez 包从 Pubmed 数据库中获取有关作者、文章 ID 和作者隶属关系的信息。这工作正常，但现在我想从附属字段中提取信息。不幸的是，隶属关系字段是广泛的非
php - 为什么 Pubmed 从 PHP 脚本生成的结果与手动搜索生成的结果不同？
我编写了一个 PHP 脚本，它根据用户输入自动搜索 NCBI Pubmed 数据库。这是一个相当大的脚本，我不会费心把它全部放在这里。但我无法弄清楚的一个问题是，为什么当我使用 esearch(eut
ncbi - 如何从 pubmed data ncbi 下载所有抽象数据
我想下载所有发布的数据摘要。有谁知道我如何轻松下载所有已发表的文章摘要？我得到了数据的来源: ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/af/12/ 反正有没有下载所有
r - 如何使用 R 搜索 PubMed 或其他数据库
我最近一直在使用优秀的rplos package ，这使得搜索公共(public)科学图书馆 (PLOS) API 上托管的论文变得非常容易。我遇到了一个障碍，因为 API 本身似乎缺少一些信息 -
http-get - 如何使用 PubMed API 搜索具有确切标题的文章？
我正在尝试使用 PubMed API 搜索具有确切标题的文章。举个例子，我想搜索标题 The cost-effectiveness of mirtazapine versus paroxetine i
api - 下载 PMC 和 PubMed 数据库中的所有全文文章
根据NCBI Help Desk 的回答之一，我们无法“批量下载”医学中心 .但是，我可以使用“NCBI E-utilities”下载吗？全部 PMC 数据库中的全文论文使用 Efetch 或者至少使
java - 在 Java 中下载 Pubmed Abstracts
有没有人实现了一个程序，可以下载带有标题、作者、日期和内容的已发表摘要，以在给定 MESH 术语的情况下将纯文本文件分开？最佳答案 http://www.ncbi.nlm.nih.gov/entre
python - 如何在 python 爬虫中访问具有多个页面的表单的 pubmed 数据
我正在尝试使用 python 抓取 pubmed 并获取一篇文章被引用的所有论文的 pubmed ID。例如本文(ID:11825149) http://www.ncbi.nlm.nih.gov/p
ruby - 如何使用 PUBMEDid 从 pubmed 中获取摘要
我手动转到 pubmed，例如搜索我的主题，例如 http://www.ncbi.nlm.nih.gov/pubmed/?term=Cancer+TFF然后从夏天开始，我得到了 PMID。然后尝试使用

首页

博学

6Ren·AI

商城

python - 从 pubmed 获取摘要