gpt4 book ai didi

java - DOM 解析器因具有 DOCTYPE 声明的 HTML 而卡住

转载 作者:行者123 更新时间:2023-12-01 09:33:34 26 4
gpt4 key购买 nike

该程序从我的站点读取两个 HTML,然后解析每个 HTML。第一个 HTML (pass.html) 中没有 DOCTYPE 声明。pass.html 解析正常。

第二个 HTML ( freeze.html )有 DOCTYPE 声明。freeze.html 被判断为 fullyvalid通过W3C的验证服务。但是,当我尝试解析 freeze.html 时,程序卡住在 .parse(is)

出了什么问题?

import java.io.InputStream;
import java.net.URL;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;

class DOMCallFreezes {
public static void main(String[] args) throws Exception {
new DOMCallFreezes().main();
}

void main() throws Exception {
demo("pass.html");
demo("freeze.html");
}

void demo(String htmlName) throws Exception {
final String baseUrl = "http://x19290.appspot.com/dom-no-good/";
URL url = new URL(baseUrl + htmlName);
try (final InputStream is = url.openStream()) {
final Document doc = newDocumentBuilder().parse(is);
final DOMSource src = new DOMSource(doc);
final StreamResult dst = new StreamResult(System.out);
newTransformer().transform(src, dst);
}
}

DocumentBuilder newDocumentBuilder() throws Exception {
final DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
return f.newDocumentBuilder();
}

Transformer newTransformer() throws Exception {
final TransformerFactory f = TransformerFactory.newInstance();
return f.newTransformer();
}
}

pass.html

<?xml version="1.0" encoding="US-ASCII"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>pass</title>
</head>
<body>
<h1>no DOCTYPE declaration</h1>
</body>
</html>

卡住.html

<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>freeze</title>
</head>
<body>
<h1>has DOCTYPE declaration</h1>
</body>
</html>

最佳答案

以下设置指示解析器不要从 DOCTYPE 声明加载外部 DTD。更改方法newDocumentBuilder():

DocumentBuilder newDocumentBuilder() throws Exception {
final DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setValidating(false);
f.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
return f.newDocumentBuilder();
}

关于java - DOM 解析器因具有 DOCTYPE 声明的 HTML 而卡住,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39189174/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com