gpt4 book ai didi

python - 如何使用 lxml 针对多个 xsd 模式进行验证?

转载 作者:行者123 更新时间:2023-11-28 21:09:04 26 4
gpt4 key购买 nike

我正在编写一个单元测试来验证我生成的站点地图 xml,方法是获取它的 xsd 模式并使用 python 的 lxml 库进行验证:

这是我的根元素的一些元数据:

xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
http://www.google.com/schemas/sitemap-image/1.1
http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd"

还有这个测试代码:

_xsd_validators = {}
def get_xsd_validator(url):
if url not in _xsd_validators:
_xsd_validators[url] = etree.XMLSchema(etree.parse(StringIO(requests.get(url).content)))
return _xsd_validators[url]


# this util function is later on in a TestCase
def validate_xml(self, content):
content.seek(0)
doc = etree.parse(content)
schema_loc = doc.getroot().attrib.get('{http://www.w3.org/2001/XMLSchema-instance}schemaLocation').split(' ')
# lxml doesn't like multiple namespaces
for i, loc in enumerate(schema_loc):
if i % 2 == 1:
get_xsd_validator(schema_loc[i]).assertValid(doc)
return doc

验证失败的示例 XML:

<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
http://www.google.com/schemas/sitemap-image/1.1
http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd"
>
<url>
<loc>https://www.example.com/press</loc>
<lastmod>2016-08-11</lastmod>

<changefreq>weekly</changefreq>
</url>

<url>
<loc>https://www.example.com/about-faq</loc>
<lastmod>2016-08-11</lastmod>

<changefreq>weekly</changefreq>
</url>


</urlset>

当我只有一个常规站点地图时,一切都很好,但是当我在图像站点地图标记中添加 assertValid 开始失败时:

E   DocumentInvalid: Element '{http://www.google.com/schemas/sitemap-image/1.1}image': No matching global element declaration available, but demanded by the strict wildcard., line 12

或者:

E   DocumentInvalid: Element '{http://www.sitemaps.org/schemas/sitemap/0.9}urlset': No matching global declaration available for the validation root., line 6

最佳答案

您可以尝试定义一个包装器架构 wrapper-schema.xsd 来导入所有需要的架构,并将此架构与 lxml 一起使用,而不是其他每个架构。

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:import
namespace="http://www.sitemaps.org/schemas/sitemap/0.9"
schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"/>
<xs:import
namespace="http://www.google.com/schemas/sitemap-image/1.1"
schemaLocation="http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd"/>
</xs:schema>

我没有 python,但这在 oXygen 中成功验证:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="wrapper-schema.xsd"
>
<image:image>
<image:loc>http://www.example.com/image</image:loc>
</image:image>
<url>
<loc>https://www.example.com/press</loc>
<lastmod>2016-08-11</lastmod>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://www.example.com/about-faq</loc>
<lastmod>2016-08-11</lastmod>
<changefreq>weekly</changefreq>
</url>
</urlset>

关于python - 如何使用 lxml 针对多个 xsd 模式进行验证?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38910347/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com