gpt4 book ai didi

r - 使用rentrez在R中解析XML文件

转载 作者:行者123 更新时间:2023-12-03 17:02:42 29 4
gpt4 key购买 nike

我不是XML专家。使用XML解析rentrez文件时出现问题。我正在尝试将每个pmid(PubMed数据库中的商品ID)作为作者和隶属关系作为输出。我的代码运行良好,除非作者具有多个从属关系。当作者具有多个从属关系时,列first_nameslast_namesaffiliation的长度会不同,并且会返回错误。我真的不具备处理XML解析的专业知识。我严格期望如下结果:

pmid         first_names  last_names              affiliation
27869504 Luca Villa Division of Experimental Oncology/Unit of Urology, URI , IRCCS Ospedale San Raffaele, Milan, Italy
27869504 Luca Villa Department of Urology, Tenon Hospital, Pierre and Marie Curie University , Paris, France
27869504 Tarik Emre Şener Department of Urology, Tenon Hospital, Pierre and Marie Curie University , Paris, France
27869504 Tarik Emre Şener Department of Urology, Marmara University School of Medicine, Istanbul, Turkey


entrez_fetch返回的示例XML文件的结构如下:

 <?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2017//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd">
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Status="In-Data-Review" Owner="NLM">
<PMID Version="1">27869504</PMID>
<DateCreated>
<Year>2016</Year>
<Month>11</Month>
<Day>21</Day>
</DateCreated>
<DateRevised>
<Year>2017</Year>
<Month>01</Month>
<Day>06</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1557-900X</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>31</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2017</Year>
<Month>Jan</Month>
</PubDate>
</JournalIssue>
<Title>Journal of endourology</Title>
<ISOAbbreviation>J. Endourol.</ISOAbbreviation>
</Journal>
<ArticleTitle>Initial Content Validation Results of a New Simulation Model for Flexible Ureteroscopy: The Key-Box.</ArticleTitle>
<Pagination>
<MedlinePgn>72-77</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1089/end.2016.0677</ELocationID>
<Abstract>
<AbstractText Label="PURPOSE" NlmCategory="OBJECTIVE">We sought to test the content validity of a new training model for flexible ureteroscopy: the Key-Box.</AbstractText>
<AbstractText Label="MATERIAL AND METHODS" NlmCategory="METHODS">Sixteen medical students were randomized to undergo a 10-day training consisting of performing 10 different exercises aimed at learning specific movements with the flexible ureteroscope, and how to catch and release stones with a nitinol basket using the Key-Box (n&#x2009;=&#x2009;8 students in the training group, n&#x2009;=&#x2009;8 students in the nontraining control group). Subsequently, an expert endourologist (O.T.) blindly assessed skills acquired by the whole cohort of students through two exercises on ureteroscope manipulation and one exercise on stone capture selected among those used for the training. A performance scale (1-5) assessing different steps of the procedure was used to evaluate each student. Time to complete the exercises was measured. Mann-Whitney Rank Sum test was used for comparisons between the two groups.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">Mean scores obtained by trained students were significantly higher compared with those obtained by nontrained students (all p&#x2009;&lt;&#x2009;0.001). All trained students were able to complete the two exercises on ureteroscope manipulation within 3 minutes, whereas two students (25%) were not able to finish the exercise on stone capture. Conversely, four (50%) and six (75%) nontrained students were not able to finish one out of the two exercises on ureteroscope manipulation and the exercise on stone capture, respectively. The mean time to complete the three exercises was 76.3, 69.9, and 107 and 172.5, 137.9, and 168 seconds in the trained and nontrained groups, respectively (all p&#x2009;&lt;&#x2009;0.001).</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">The K-Box(&#xAE;) seems to be a valid easy-to-use training model for initiating novel endoscopists to flexible ureteroscopy.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Villa</LastName>
<ForeName>Luca</ForeName>
<Initials>L</Initials>
<AffiliationInfo>
<Affiliation>1 Division of Experimental Oncology/Unit of Urology, URI , IRCCS Ospedale San Raffaele, Milan, Italy .</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>2 Department of Urology, Tenon Hospital, Pierre and Marie Curie University , Paris, France .</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>&#x15E;ener</LastName>
<ForeName>Tarik Emre</ForeName>
<Initials>TE</Initials>
<AffiliationInfo>
<Affiliation>2 Department of Urology, Tenon Hospital, Pierre and Marie Curie University , Paris, France .</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>3 Department of Urology, Marmara University School of Medicine , Istanbul, Turkey .</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Somani</LastName>
<ForeName>Bhaskar K</ForeName>
<Initials>BK</Initials>
<AffiliationInfo>
<Affiliation>4 Department of Urology, University Hospital Southampton NHS Trust , Southampton, United Kingdom .</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Cloutier</LastName>
<ForeName>Jonathan</ForeName>
<Initials>J</Initials>
<AffiliationInfo>
<Affiliation>2 Department of Urology, Tenon Hospital, Pierre and Marie Curie University , Paris, France .</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>5 Department of Urology, University Hospital Centre of Quebec City , Quebec, Canada .</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Buttic&#xE8;</LastName>
<ForeName>Salvatore</ForeName>
<Initials>S</Initials>
<AffiliationInfo>
<Affiliation>2 Department of Urology, Tenon Hospital, Pierre and Marie Curie University , Paris, France .</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>6 Department of Urology, University of Messina , Messina, Italy .</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Marson</LastName>
<ForeName>Francesco</ForeName>
<Initials>F</Initials>
<AffiliationInfo>
<Affiliation>2 Department of Urology, Tenon Hospital, Pierre and Marie Curie University , Paris, France .</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>7 Department of Urology, Citt&#xE0; della Salute e della Scienza, Turin, Italy .</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Doizi</LastName>
<ForeName>Steeve</ForeName>
<Initials>S</Initials>
<AffiliationInfo>
<Affiliation>2 Department of Urology, Tenon Hospital, Pierre and Marie Curie University , Paris, France .</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Proietti</LastName>
<ForeName>Silvia</ForeName>
<Initials>S</Initials>
<AffiliationInfo>
<Affiliation>2 Department of Urology, Tenon Hospital, Pierre and Marie Curie University , Paris, France .</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>8 Department of Urology, IRCCS San Raffaele Scientific Institute , Ville Turro Division, Milan, Italy .</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Traxer</LastName>
<ForeName>Olivier</ForeName>
<Initials>O</Initials>
<AffiliationInfo>
<Affiliation>2 Department of Urology, Tenon Hospital, Pierre and Marie Curie University , Paris, France .</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>12</Month>
<Day>16</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>J Endourol</MedlineTA>
<NlmUniqueID>8807503</NlmUniqueID>
<ISSNLinking>0892-7790</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">flexible ureteroscopy</Keyword>
<Keyword MajorTopicYN="N">learning curve</Keyword>
<Keyword MajorTopicYN="N">training model</Keyword>
<Keyword MajorTopicYN="N">ureteroscopy curriculum</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="pubmed">
<Year>2016</Year>
<Month>11</Month>
<Day>22</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2016</Year>
<Month>11</Month>
<Day>22</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2016</Year>
<Month>11</Month>
<Day>22</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">27869504</ArticleId>
<ArticleId IdType="doi">10.1089/end.2016.0677</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>


以下是我正在使用的代码,除了在PubMed数据库中文章作者有多个从属关系之外,它都可以正常工作:

 library(rentrez)
library(XML)

pubmedSearch <- entrez_search("pubmed", term = "flexible ureteroscope Simulation Model",
retmax = 10)
SearchResults <- entrez_fetch(db="pubmed", pubmedSearch$ids, rettype="xml",
parsed=TRUE)

xmlGetValue <- function(x, node){
a <- xpathSApply(x, node, xmlValue)
if(length(a) == 0) {a <- NA} else {a}
}

parse_paper <- function(paper){
pmid <- xmlGetValue(paper, ".//ArticleId[@IdType='pubmed']")
first_names <- xmlGetValue(paper, ".//Author/ForeName")
last_names <- xmlGetValue(paper, ".//Author/LastName")
affiliation <- xmlGetValue(paper, ".//AffiliationInfo/Affiliation")
data.frame(pmid=pmid, first_names=first_names, last_names=last_names,
affiliation=affiliation)
}

parse_multiple_papers <- function(papers){
res <- xpathApply(papers, "/PubmedArticleSet/*", parse_paper)
do.call(rbind.data.frame, res)
}

test_df <- parse_multiple_papers(SearchResults)


非常感谢您的帮助和支持。

最佳答案

这个问题也以issue @ rentrez's repository的形式出现,那里给出了一种可能的解决方案的详细信息。我也会在这里包含该代码

parse_author <- function(author){
fn <- xmlValue(author[["ForeName"]])
ln <- xmlValue(author[["LastName"]])
aff <-paste(xpathApply(author, "AffiliationInfo/Affiliation", xmlValue), collapse="; ")
list(forname=fn, lastname=ln, affiliation=aff)
}

parse_paper <- function(paper){
author_info <- xpathApply(paper, ".//AuthorList/Author", parse_author)
res <- do.call(rbind.data.frame, author_info)
res$pmid <-xpathSApply(paper, ".//ArticleId[@IdType='pubmed']", xmlValue)
res
}

parse_multiple_papers <- function(papers){
res <- xpathApply(papers, "/PubmedArticleSet/*", parse_paper)
do.call(rbind.data.frame, res)
}

head(parse_multiple_papers(SearchResults))

关于r - 使用rentrez在R中解析XML文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42593415/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com