gpt4 book ai didi

python - 使用 Beautiful Soup 解析 Grobid .tei.xml 输出

转载 作者:太空宇宙 更新时间:2023-11-03 21:44:31 33 4
gpt4 key购买 nike

我正在尝试使用 Beautiful Soup 从使用 Grobid 生成的 .tei.xml 文件中提取元素。

我可以使用以下方式获取标题:

titles = soup.findAll('title')

访问“较低级别”元素的正确语法是什么? (作者/单位等)

这是 tei.xml 文件的一部分,是 Grobid 输出:

 <?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 /data/grobid-0.5.1/grobid-home/schemas/xsd/Grobid.xsd"
xmlns:xlink="http://www.w3.org/1999/xlink">
<teiHeader xml:lang="en">
<encodingDesc>
<appInfo>
<application version="0.5.1-SNAPSHOT" ident="GROBID" when="2018-08-15T14:51+0000">
<ref target="https://github.com/kermitt2/grobid">GROBID - A machine learning software for extracting information from scholarly documents</ref>
</application>
</appInfo>
</encodingDesc>
<fileDesc>
<titleStmt>
<title level="a" type="main">The Role of Artificial Intelligence in Software Engineering</title>
</titleStmt>
<publicationStmt>
<publisher/>
<availability status="unknown"><licence/></availability>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Mark</forename><surname>Harman</surname></persName>
<affiliation key="aff0">
<orgName type="department">CREST Centre</orgName>
<orgName type="institution">University College London</orgName>
<address>
<addrLine>Malet Place</addrLine>
<postCode>WC1E 6BT</postCode>
<settlement>London</settlement>
<country key="GB">UK</country>
</address>
</affiliation>
</author>
<title level="a" type="main">The Role of Artificial Intelligence in Software Engineering</title>
</analytic>
<monogr>
<imprint>
<date/>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
</fileDesc>

谢谢。

最佳答案

BeautifulSoup 将节点小写,以下是一些示例:

title = soup.html.body.teiheader.filedesc.analytic.title.string

for author in soup.html.body.teiheader.filedesc.sourcedesc.find_all('author'):
tag_or_none = author.persname.forename
first_affiliation = author.affiliation

还有see the BeautifulSoup documentation它涵盖了一切。

我现在正在解决类似的问题并寻求合作。如果您想组队,请告诉我 - sof@nconnor.com

关于python - 使用 Beautiful Soup 解析 Grobid .tei.xml 输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52594370/

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com