gpt4 book ai didi

python - 如何从 Python 文件中的 XML 类标签中查找字符串?

转载 作者:行者123 更新时间:2023-11-28 22:19:45 24 4
gpt4 key购买 nike

我有一个 RDF 文档,如下所示:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
xmlns:cd="http:xyz.com#">

<rdf:Description rdf:about="http:xyz.com#">
<cd:algorithmid>DPOT-5ab247867d368</cd:algorithmid>
<cd:owner>arun</cd:owner>
<cd:acesskey>ACCESS-5ab247867d370</cd:acesskey>
<cd:purpose>Research</cd:purpose>
<cd:metadata>10</cd:metadata>
<cd:completeness>Partial</cd:completeness>
<cd:completeness>Yes</cd:completeness>
<cd:inclusion_1>age</cd:inclusion_1>
<cd:feature_1>Sex</cd:feature_1>
<cd:target>Diagnosis</cd:target>
</rdf:Description>

</rdf:RDF>

从上面的文本中,我需要提取目标(即只有开始和结束“cd:target”标签内的值)。所需的输出应该是“诊断”。我尝试使用 XML 解析器,但它不起作用,因为树包含“:”。请问有什么更好的解决办法吗?

更新:这是我试过的,很抱歉天真的编码风格。

import xml.etree.ElementTree as et

def metadataParser(metadataFile):
with open(metadataFile, 'r') as m:
data = m.read()
# Load the xml content from a string
content = et.fromstring(data)
description = content.find('rdf:Description')
target = description.find("cd:target")

return target


target = metadataParser('metadata.rdf')
print(target)

最佳答案

您可以使用 BeautifulSoup模块及其 XML parser .

from bs4 import BeautifulSoup

XML = '''
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http:xyz.com#">

<rdf:Description rdf:about="http:xyz.com#">
<cd:algorithmid>DPOT-5ab247867d368</cd:algorithmid>
<cd:owner>arun</cd:owner>
<cd:acesskey>ACCESS-5ab247867d370</cd:acesskey>
<cd:purpose>Research</cd:purpose>
<cd:metadata>10</cd:metadata>
<cd:completeness>Partial</cd:completeness>
<cd:completeness>Yes</cd:completeness>
<cd:inclusion_1>age</cd:inclusion_1>
<cd:feature_1>Sex</cd:feature_1>
<cd:target>Diagnosis</cd:target>
</rdf:Description>

</rdf:RDF>'''

soup = BeautifulSoup(XML, 'xml')

target = soup.find('target').text
print(target)
# Diagnosis

如您所见,它非常易于使用。

关于python - 如何从 Python 文件中的 XML 类标签中查找字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49494444/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com