gpt4 book ai didi

Python lxml - 使用 xml :lang attribute to retrieve an element

转载 作者:太空宇宙 更新时间:2023-11-04 03:30:31 26 4
gpt4 key购买 nike

我有一些 xml,其中包含多个同名元素,但每个元素使用不同的语言,例如:

<Title xml:lang="FR" type="main">Les Tudors</Title>
<Title xml:lang="DE" type="main">Die Tudors</Title>
<Title xml:lang="IT" type="main">The Tudors</Title>

通常,我会使用如下属性检索元素:

titlex = info.find('.//xmlns:Title[@someattribute=attributevalue]', namespaces=nsmap)

如果我尝试使用 [@xml:lang="FR"] 执行此操作(例如),我会收到回溯错误:

  File "D:/Python code/RBM CRID, Title, Genre/CRID, Title, Genre, Age rating, Episode Number, Descriptions V1.py", line 29, in <module>
titlex = info.find('.//xmlns:Title[@xml:lang=PL]', namespaces=nsmap)

File "lxml.etree.pyx", line 1457, in lxml.etree._Element.find (src\lxml\lxml.etree.c:51435)

File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 282, in find
it = iterfind(elem, path, namespaces)

File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 272, in iterfind
selector = _build_path_iterator(path, namespaces)

File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 256, in _build_path_iterator
selector.append(ops[token[0]](_next, token))

File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 134, in prepare_predicate
token = next()

File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 80, in xpath_tokenizer
raise SyntaxError("prefix %r not found in prefix map" % prefix) SyntaxError: prefix 'xml' not found in prefix map

我对此并不感到惊讶,但我想要有关如何解决该问题的建议。

谢谢!

根据要求,一个精简但完整的代码集(如果我删除 [bitsinsquarebrackets],它会按预期工作):

import lxml
import codecs

file_name = (input('Enter the file name, excluding .xml extension: ') + '.xml')# User inputs file name
print('Parsing ' + file_name)


#----- Sets up import and namespace

from lxml import etree

parser = lxml.etree.XMLParser()


tree = lxml.etree.parse(file_name, parser) # Name of file to test goes here
root = tree.getroot()

nsmap = {'xmlns': 'urn:tva:metadata:2012',
'mpeg7': 'urn:tva:mpeg7:2008'}

#----- This code writes the output to a file

with codecs.open(file_name+'.log', mode='w', encoding='utf-8') as f: # Name the output file
f.write(u'CRID|Title|Genre|Rating|Short Synopsis|Medium Synopsis|Long Synopsis\n')
for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap):
titlex = info.find('.//xmlns:Title[xml:lang="PL"]', namespaces=nsmap) # Retreve the title
title = titlex.text if titlex != None else 'Missing' # If there isn't a title, print an alternative word
f.write(u'{}\n'.format(title)) # Write all the retrieved values to the same line with bar seperators and a new line

最佳答案

xml:lang中的xml前缀不需要在XML文档中声明,但是如果要使用xml:lang 在 XPath 查找中,您必须在 Python 代码中定义前缀映射。

xml 前缀是保留的(与任意的“普通” namespace 前缀相反)并定义为绑定(bind)到 http://www.w3.org/XML/1998/命名空间。查看Namespaces in XML 1.0 W3C 推荐。

例子:

from lxml import etree

# Required mapping
nsmap = {"xml": "http://www.w3.org/XML/1998/namespace"}

XML = """
<root>
<Title xml:lang="FR" type="main">Les Tudors</Title>
<Title xml:lang="DE" type="main">Die Tudors</Title>
<Title xml:lang="IT" type="main">The Tudors</Title>
</root>"""

doc = etree.fromstring(XML)

title_FR = doc.find('Title[@xml:lang="FR"]', namespaces=nsmap)
print(title_FR.text)

输出:

Les Tudors

如果 xml 前缀没有映射,您将收到“前缀映射中未找到前缀‘xml’”错误。如果映射到 xml 前缀的 URI 不是 http://www.w3.org/XML/1998/namespace,则 find 方法上面的代码片段中不返回任何内容。

关于Python lxml - 使用 xml :lang attribute to retrieve an element,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31250641/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com