gpt4 book ai didi

java - W3C 通过从网站中删除模块来破坏 XHTML 1.1 解析

转载 作者:行者123 更新时间:2023-12-01 17:59:52 24 4
gpt4 key购买 nike

W3C recommended list of doctype declarations表示 XHTML 1.1 的以下文档类型:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

这与 A List Apart 推荐的系统 ID 相同,Wiley Dummies site等等。它是模块化 XHTML 1.1 DTD 的标准系统 ID 之一。

不幸的是,这个模块化 DTD 引用了其他 XML 实体,其中一些实体已被 W3C 从其站点中删除,从而完全破坏了解析。

您可以在 Java 11 中对此进行测试。从以下 XHTML 1.1 文件开始:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>XHTML 1.1 Skeleton</title>
</head>
<body>
</body>
</html>

尝试使用标准的内置 Java 解析器来解析它:

DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
final Document document;
try (InputStream inputStream = new BufferedInputStream(getClass().getResourceAsStream("xhtml-1.1-test.xhtml"))) {
document = documentBuilder.parse(inputStream);
}

解析将失败,抛出 java.io.FileNotFoundException对于 http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod 。显然,W3C 已从其网站上完全删除了该实体。

如果改为 http://www.w3.org/MarkUp/DTD/xhtml11.dtd使用(在 XHTML 1.1 specification DTD 中出现注释),解析正常完成(尽管大约需要 10 分钟)。

为什么 W3C 在 http://www.w3.org/TR/xhtml11/DTD/ 上提供的实体不足集合,用标准系统 ID 破坏 XHTML 1.1 解析?为什么不是 http://www.w3.org/MarkUp/DTD/ 上提供的所有模块都可用?我应该联系 W3C 的谁来解决这个问题? (为什么这些实体的 HTTP 访问需要这么长时间?)

最佳答案

您提到的替代 URL - http://www.w3.org/MarkUp/DTD/xhtml11.dtd - 似乎在 XHTML 1.1 规范/DTDs/modules 中一致使用,并且似乎是 W3C 认可的,而不是 http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd。我的猜测是,对这些声明集的访问被故意限制,因为 W3C 不想向公众提供这些声明集;您应该将它们存储在本地,并使用 SGML/XML 目录文件将标识符映射到本地实体/声明集。

我通过调用 libxml2 的 xmllint 命令行工具成功验证了 XHTML 1.1 文件

 SGML_CATALOG_FILES=./catalog xmllint --catalogs --dtdvalid xhtml11.dtd testdoc.xhtml

具有以下内容的 catalog 文件(以及引用的 .dtd.mod.ent 当然,文件位于该目录中):

OVERRIDE YES

SGMLDECL "xml1.dcl"
PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd"
PUBLIC "-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN" "xhtml11-model-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Common Attributes 1.0//EN" "xhtml-attribs-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-attribs-1.mod" "xhtml-attribs-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Base Element 1.0//EN" "xhtml-base-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-base-1.mod" "xhtml-base-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML BDO Element 1.0//EN" "xhtml-bdo-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-bdo-1.mod" "xhtml-bdo-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Phrasal 1.0//EN" "xhtml-blkphras-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkphras-1.mod" "xhtml-blkphras-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Presentation 1.0//EN" "xhtml-blkpres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkpres-1.mod" "xhtml-blkpres-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Structural 1.0//EN" "xhtml-blkstruct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkstruct-1.mod" "xhtml-blkstruct-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Character Entities 1.0//EN" "xhtml-charent-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-charent-1.mod" "xhtml-charent-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Client-side Image Maps 1.0//EN" "xhtml-csismap-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-csismap-1.mod" "xhtml-csismap-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Datatypes 1.0//EN" "xhtml-datatypes-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-datatypes-1.mod" "xhtml-datatypes-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Editing Markup 1.0//EN" "xhtml-edit-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-edit-1.mod" "xhtml-edit-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Intrinsic Events 1.0//EN" "xhtml-events-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-events-1.mod" "xhtml-events-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Forms 1.0//EN" "xhtml-form-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-form-1.mod" "xhtml-form-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Modular Framework 1.0//EN" "xhtml-framework-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-framework-1.mod" "xhtml-framework-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Hypertext 1.0//EN" "xhtml-hypertext-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-hypertext-1.mod" "xhtml-hypertext-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Images 1.0//EN" "xhtml-image-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-image-1.mod" "xhtml-image-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Phrasal 1.0//EN" "xhtml-inlphras-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlphras-1.mod" "xhtml-inlphras-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Presentation 1.0//EN" "xhtml-inlpres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlpres-1.mod" "xhtml-inlpres-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Structural 1.0//EN" "xhtml-inlstruct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlstruct-1.mod" "xhtml-inlstruct-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Inline Style 1.0//EN" "xhtml-inlstyle-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlstyle-1.mod" "xhtml-inlstyle-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Legacy Markup 1.0//EN" "xhtml-legacy-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-legacy-1.mod" "xhtml-legacy-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Link Element 1.0//EN" "xhtml-link-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-link-1.mod" "xhtml-link-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Lists 1.0//EN" "xhtml-list-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-list-1.mod" "xhtml-list-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Metainformation 1.0//EN" "xhtml-meta-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-meta-1.mod" "xhtml-meta-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Embedded Object 1.0//EN" "xhtml-object-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-object-1.mod" "xhtml-object-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Param Element 1.0//EN" "xhtml-param-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-param-1.mod" "xhtml-param-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Presentation 1.0//EN" "xhtml-pres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-pres-1.mod" "xhtml-pres-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Qualified Names 1.0//EN" "xhtml-qname-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-qname-1.mod" "xhtml-qname-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Ruby 1.0//EN" "xhtml-ruby-1.mod"
SYSTEM "http://www.w3.org/TR/ruby/xhtml-ruby-1.mod" "xhtml-ruby-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Scripting 1.0//EN" "xhtml-script-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-script-1.mod" "xhtml-script-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Server-side Image Maps 1.0//EN" "xhtml-ssismap-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-ssismap-1.mod" "xhtml-ssismap-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Document Structure 1.0//EN" "xhtml-struct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-struct-1.mod" "xhtml-struct-1.mod"
PUBLIC "-//W3C//DTD XHTML Style Sheets 1.0//EN" "xhtml-style-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-style-1.mod" "xhtml-style-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Tables 1.0//EN" "xhtml-table-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-table-1.mod" "xhtml-table-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Text 1.0//EN" "xhtml-text-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-text-1.mod" "xhtml-text-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN" "xhtml11-model-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml11-model-1.mod" "xhtml11-model-1.mod"
PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent" "xhtml-lat1.ent"
PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "xhtml-special.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-special.ent" "xhtml-special.ent"
PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "xhtml-symbol.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-symbol.ent" "xhtml-symbol.ent"

请注意,这是 SGML/传统/普通目录语法。如果您想将其与 Java/JAXP 一起使用,则必须将其转换为 XML 语法的目录文件。

关于java - W3C 通过从网站中删除模块来破坏 XHTML 1.1 解析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60655704/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com