具有多个 namespace 的python lxml findall-6ren

具有多个 namespace 的python lxml findall

转载作者：太空宇宙更新时间：2023-11-03 15:02:07

47

4

我正在尝试使用 lxml 解析具有多个命名空间的 XML 文档，但我一直坚持让 findall() 方法返回一些内容。

我的 XML:

<MeasurementRecords xmlns="http://www.company.com/common/rsp/2012/07"
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"         
                    xsi:schemaLocation="http://www.company.com/common/rsp/2012/07 RSP_EWS_V1.6.xsd">
    <HistoryRecords>
        <ValueItemId>100_0000100004_3788_Resource-0.customId_WSx Data Precip Type</ValueItemId>
            <List>
                <HistoryRecord>
                    <Value>60</Value>
                    <State>Valid</State>
                    <TimeStamp>2016-04-20T12:40:00Z</TimeStamp>
                </HistoryRecord>
            </List>
        </HistoryRecords>
    <HistoryRecords>
</MeasurementRecords>

我的代码:

from lxml import etree
from pprint import pprint

RSPxmlFile = '/home/user/Desktop/100_0000100004_3788_20160420144011263_records.xml'

with open (RSPxmlFile, 'rt') as f:
    tree = etree.parse(f)

root = tree.getroot()

for node in tree.findall('MeasurementRecords', root.nsmap):
    print node
    print "parameter = ", node.text

给予:

ValueError: empty namespace prefix is not supported in ElementPath

我在阅读后尝试的一些实验 this :

>>> root.nsmap
{'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: http://www.company.com/common/rsp/2012/07'}

>>> nsmap['foo']=nsmap[None]
>>> nsmap.pop(None)
'http://www.company.com/common/rsp/2012/07'
>>> nsmap
{'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 'foo': 'http://www.company.com/common/rsp/2012/07'}
>>> tree.xpath("//MeasurementRecords", namespaces=nsmap)
[]
>>> tree.xpath('/foo:MeasurementRecords', namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290>]
>>> tree.xpath('/foo:MeasurementRecords/HistoryRecords', namespaces=nsmap)
[]

但这似乎没有帮助。

所以，更多的实验:

>>> tree.findall('//{http://www.company.com/common/rsp/2012/07}MeasurementRecords')
[]
>>> print root
<Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290>
>>> print tree
<lxml.etree._ElementTree object at 0x6ffffda5368>
>>> for node in tree.iter():
...     print node
...
<Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290>
<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x6ffffda5cf8>
<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x6ffffda5f38>
...etc...
>>> tree.findall("//HistoryRecords", namespaces=nsmap)
[]
>>> tree.findall("//foo:MeasurementRecords/HistoryRecords", namespaces=nsmap)
[]

我被难住了。我不知道出了什么问题。

最佳答案

如果你从这里开始:

>>> tree = etree.parse(open('data.xml'))
>>> root = tree.getroot()
>>>

这将无法找到任何元素...

>>> root.findall('{http://www.company.com/common/rsp/2012/07}MeasurementRecords')
[]

...但那是因为 root 是 MeasurementRecords 元素；它不包含任何MeasurementRecords元素。在另一手，以下工作正常:

>>> root.findall('{http://www.company.com/common/rsp/2012/07}HistoryRecords')
[<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x7fccd0332ef0>]
>>>

使用 xpath 方法，您可以执行如下操作:

>>> nsmap={'a': 'http://www.company.com/common/rsp/2012/07',
... 'b': 'http://www.w3.org/2001/XMLSchema-instance'}
>>> root.xpath('//a:HistoryRecords', namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x7fccd0332ef0>]

所以:

findall 和 find 方法需要 {...namespace...}ElementName 语法。
xpath 方法需要 namespace 前缀 (ns:ElementName)，它在提供的 namespaces 映射中查找。前缀不必与原始文档中使用的前缀匹配，但命名空间 url 必须匹配。

所以这是可行的:

>>> root.find('{http://www.company.com/common/rsp/2012/07}HistoryRecords/{http://www.company.com/common/rsp/2012/07}ValueItemId')
<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x7fccd0332a70>

或者这个可行:

>>> root.xpath('/a:MeasurementRecords/a:HistoryRecords/a:ValueItemId',namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x7fccd0330830>]

关于具有多个 namespace 的python lxml findall，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36777424/

47

4

0

文章推荐： python - Django - 通过验证过滤范围内的日期

文章推荐： python - 使用大数据文件进行计算

文章推荐： node.js - 用于 SockJS + Express 的 Nodejs SSL

文章推荐： Python3-Pyqt5如何通过异常或按退出按钮结束线程中的循环

序言 | findall 与另一个 findall 的结果列表
如何通过特定条件获取值，然后使用这些选定元素从其他事实系列中获取值？我有这个代码 %code, date, amount values1('AAA', date(02, 03, 2020), 100
python - BeautifulSoup，findAll 之后的 findAll？
我是 Python 的新手，主要需要它来从网站获取信息。在这里，我试图从网站底部获取简短的标题，但无法完全获取。 from bfs4 import BeautifulSoup import reque
python - findall 错误 - NoneType' 对象没有属性 'findall'
我不断收到错误“缺少 1 个必需的位置参数:'section_url'” 每次我尝试使用 findall 时都会收到此错误。刚开始学习 python，因此我们将不胜感激! from bs4 impo
node.js - 为每个 findall Sequelize findall 是可能的吗？
我有这张 table 。客户有项目，用户在项目中工作 Clients - id - name Projects - id - name - client_id Users - id - name Us
python - Beautiful Soup findAll() 对 findall() 的结果返回 TypeError
嗨，我是 Python 和 Beautiful 汤的新手。我试图仅从表格的某个部分获取文本。但似乎 findAll 的结果不是我可以再次运行 findAll 的 BeautifulSoup 类型。 s
jpa - findAll 与 Projections 中的 findAll 和 CrudRepository 发生冲突
登录 @ApiModel @Entity public class Login { @Id @GeneratedValue(strategy = GenerationType.AUTO
jpa - findAll 与 Projections 中的 findAll 和 CrudRepository 发生冲突
登录 @ApiModel @Entity public class Login { @Id @GeneratedValue(strategy = GenerationType.AUTO
java - Spring Data REST - 在不创建/搜索/findAll URL 的情况下覆盖存储库 findAll
有什么方法可以防止 Spring Data REST 为覆盖的存储库方法创建/search URL？例如，以下代码会生成一个/search/findAll URL，它复制了集合资源的功能: publ
java - Spring Data REST - 在不创建/搜索/findAll URL 的情况下覆盖存储库 findAll
有什么方法可以防止 Spring Data REST 为覆盖的存储库方法创建/search URL？例如，以下代码会生成一个/search/findAll URL，它复制了集合资源的功能: publ
java - Spring Data JpaRepository findAll(Iterable ids) + findAll(Sort 排序)
使用 Spring Data JpaRepository 可以通过某种排序获取给定 Id 的选择集合。这意味着我需要启用以下查询。我找到了一些 solution应用@NamedQuery 但我无法启用
node.js - Sequelize : findAll based on param if param is not null and just findAll if param is null
我正在尝试在我的 Express 应用程序中使用 Sequelize 获取数据，并使用 MSSQL 获取数据库。这是我的代码: getInstitution: function (req, res)
javascript - Sequelize.js/Node.js/Express.js : Tasks. findAll() 返回 TypeError:Cannot read property 'findAll' of undefined
代码应该在请求/tasks 时返回一个带有空任务的 JSON 对象，而是返回一条消息错误 - TypeError: cannot read property 'findAll' of undefine
python - 为什么 re.findall ('(ab)+' , 'abab' ) 返回 ['ab' ]= 同时 re.findall ('(ab)+?' , 'abab' ) 返回 ['ab' , 'ab' ]？
我的python版本是2.7.6 我知道 +? 是 + 的非贪婪版本。这样 re.findall('(ab)+?', 'abab') 将匹配尽可能少的 ab。结果 ['ab', 'ab'] 因此有
reactjs - Sequelize FindAll()
我正在使用 sequelize 从 mySql db 获取数据。这就是我如何使用它 const isProduct = await models.product.findAll({ where:
hibernate - findAll()不返回正确的对象类型
ItemTag对象包含一个Item对象和一个Tag对象。 (这些是Java域对象。) 这个简单的查询按预期工作。我返回一个ItemTags列表，并且可以完成ItemTags应该做的所有奇妙的事情: d
grails - 带有相似值的Grails findAll
试图简单地使用find all运行域对象的查询，并且它的行为不像我期望的那样: searchResults = Contact.findAll("from Contact as c where c.c
Groovy findAll 闭包参数
我想使用 groovy findAll 和我的参数来过滤闭包 filterClosure = { it, param -> it.getParam == param } 我现在如何在 findAl
java - findall() 并不总是从数据库中提取值吗？
我扩展了 CrudRepository 来创建一个名为 TaskDao 的类。我认为 taskDao.findall() 会从数据库中提取值。由于某种原因，taskDao.findall() 实际上返
grails - findAll()HQL不返回grails中的分页列表
我正在开发一个 grails 应用程序，在此我必须在 list.gsp 上应用过滤器框。当我使用以下查询(在我的服务中)进行过滤时，我得到了分页列表: def clientCriteria =
Python findall 正则表达式不起作用
我正在尝试创建一个正则表达式来查找 Perl 代码中的所有变量。变量如下所示:$variable_test。所以这是我使用的正则表达式: ^\$\w+$ 这给了我Python中的这一行: matc

首页

博学

6Ren·AI

商城

具有多个 namespace 的python lxml findall