gpt4 book ai didi

java - 它忽略实体引用解析的方式?

转载 作者:行者123 更新时间:2023-11-30 11:46:05 27 4
gpt4 key购买 nike

我使用的是 Java 6 和最新版本的 Xerces。我正在尝试解析这样开头的 HTML 文档 ...

<!DOCTYPE html> 

和后来引用实体“»”。解析死于异常......

org.xml.sax.SAXParseException: The entity "raquo" was referenced, but not declared. 
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at com.myco.myproject.util.XmlUtilities.getStringAsDocument(XmlUtilities.java:147)
at com.myco.myproject.util.NetUtilities.getUrlAsDocument(NetUtilities.java:65)
at com.myco.myproject.parsers.impl.AbstractMetromixParser.parsePage(AbstractMetromixParser.java:107)
at com.myco.myproject.parsers.impl.AbstractMetromixParser.getEvents(AbstractMetromixParser.java:76)
at com.myco.myproject.domain.EventFeed.refresh(EventFeed.java:81)
at com.myco.myproject.domain.EventFeed.getEvents(EventFeed.java:72)
at com.myco.myproject.parsers.impl.MetromixParserTest.testParser(MetromixParserTest.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:74)
at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:83)
at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:72)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:231)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:71)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:174)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

有没有办法告诉解析器忽略这些它无法解析的实体类型?如果没有,我必须插入什么解析器?

编辑:这是我解析 HTML 的方式,它实际上是 XHTML。在我尝试下面的操作之前,我通过 JSoup 传递 String 来清理它......

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setExpandEntityReferences(false);
final DocumentBuilder builder = factory.newDocumentBuilder();
final InputSource s = new InputSource(new StringReader(str));
org.w3c.dom.Document result = builder.parse(s);

最佳答案

1.10.3 版本开始,JSoup 提供了 W3CDom帮助程序类,它允许您将 org.jsoup.nodes.Document 直接转换为 org.w3c.dom.Document

考虑以下示例:

String str =
"<!DOCTYPE html>" +
"<html>" +
"<dody>" +
"<div>&raquo; example</div>" +
"</dody>" +
"</html>";

Document document = Jsoup.parse(str);
W3CDom w3cDom = new W3CDom();
org.w3c.dom.Document result = w3cDom.fromJsoup(document);

关于java - 它忽略实体引用解析的方式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9981786/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com