gpt4 book ai didi

python - 如何在 Python 中验证具有多个命名空间的 XML?

转载 作者:数据小太阳 更新时间:2023-10-29 02:14:19 26 4
gpt4 key购买 nike

我正在尝试在 Python 2.7 中编写一些单元测试以验证我对 OAI-PMH 模式所做的一些扩展:http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd

我遇到的问题是具有多个嵌套 namespace 的业务是由上述 XSD 中的此规范引起的:

<complexType name="metadataType">
<annotation>
<documentation>Metadata must be expressed in XML that complies
with another XML Schema (namespace=#other). Metadata must be
explicitly qualified in the response.</documentation>
</annotation>
<sequence>
<any namespace="##other" processContents="strict"/>
</sequence>
</complexType>

这是我正在使用的代码片段:

import lxml.etree, urllib2

query = "http://localhost:8080/OAI-PMH?verb=GetRecord&by_doc_ID=false&metadataPrefix=nsdl_dc&identifier=http://www.purplemath.com/modules/ratio.htm"
schema_file = file("../schemas/OAI/2.0/OAI-PMH.xsd", "r")
schema_doc = etree.parse(schema_file)
oaischema = etree.XMLSchema(schema_doc)

request = urllib2.Request(query, headers=xml_headers)
response = urllib2.urlopen(request)
body = response.read()
response_doc = etree.fromstring(body)

try:
oaischema.assertValid(response_doc)
except etree.DocumentInvalid as e:
line = 1;
for i in body.split("\n"):
print "{0}\t{1}".format(line, i)
line += 1
print(e.message)

我最终遇到以下错误:

AssertionError: http://localhost:8080/OAI-PMH?verb=GetRecord&by_doc_ID=false&metadataPrefix=nsdl_dc&identifier=http://www.purplemath.com/modules/ratio.htm
Element '{http://www.openarchives.org/OAI/2.0/oai_dc/}oai_dc': No matching global element declaration available, but demanded by the strict wildcard., line 22

我理解错误,因为架构要求严格验证元数据元素的子元素,示例 xml 就是这样做的。

现在我已经用 Java 编写了一个可以工作的验证器——但是如果用 Python 编写它会很有帮助,因为我正在构建的解决方案的其余部分是基于 Python 的。为了让我的 Java 变体工作,我必须让我的 DocumentFactory 命名空间感知,否则我会得到同样的错误。我没有在 python 中找到任何可以正确执行此验证的工作示例。

当我的示例文档使用 Python 验证时,有没有人知道如何获得具有多个嵌套命名空间的 XML 文档?

这是我要验证的示例 XML 文档:

<?xml version="1.0" encoding="UTF-8"?> 
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2002-02-08T08:55:46Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv.org:cs/0112017"
metadataPrefix="oai_dc">http://arXiv.org/oai2</request>
<GetRecord>
<record>
<header>
<identifier>oai:arXiv.org:cs/0112017</identifier>
<datestamp>2001-12-14</datestamp>
<setSpec>cs</setSpec>
<setSpec>math</setSpec>
</header>
<metadata>
<oai_dc:dc
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Using Structural Metadata to Localize Experience of
Digital Content</dc:title>
<dc:creator>Dushay, Naomi</dc:creator>
<dc:subject>Digital Libraries</dc:subject>
<dc:description>With the increasing technical sophistication of
both information consumers and providers, there is
increasing demand for more meaningful experiences of digital
information. We present a framework that separates digital
object experience, or rendering, from digital object storage
and manipulation, so the rendering can be tailored to
particular communities of users.
</dc:description>
<dc:description>Comment: 23 pages including 2 appendices,
8 figures</dc:description>
<dc:date>2001-12-14</dc:date>
</oai_dc:dc>
</metadata>
</record>
</GetRecord>
</OAI-PMH>

最佳答案

lxml's doc on validation 中找到这个:

>>> schema_root = etree.XML('''\
... <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
... <xsd:element name="a" type="xsd:integer"/>
... </xsd:schema>
... ''')
>>> schema = etree.XMLSchema(schema_root)

>>> parser = etree.XMLParser(schema = schema)
>>> root = etree.fromstring("<a>5</a>", parser)

所以,也许,您需要的是这个? (请参阅最后两行。):

schema_doc = etree.parse(schema_file)
oaischema = etree.XMLSchema(schema_doc)

request = urllib2.Request(query, headers=xml_headers)
response = urllib2.urlopen(request)
body = response.read()
parser = etree.XMLParser(schema = oaischema)
response_doc = etree.fromstring(body, parser)

关于python - 如何在 Python 中验证具有多个命名空间的 XML?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5332985/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com