gpt4 book ai didi

xml - 使用 dtd 时 clojure xml 解析缓慢

转载 作者:行者123 更新时间:2023-12-05 07:59:31 24 4
gpt4 key购买 nike

我正在使用 clojure.data.xml/parse解析xml。不幸的是,从服务器发回的 xml 格式不正确,因为它包含转义的 unicode 和特殊字符,但没有 dtd。我通过手动插入来解决这个问题

<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"
\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"

进入 xml,但是当我这样做时,解析时间从 <1 秒到超过 15 秒。

到目前为止,我已经通过传递 :validating false 来关闭验证到解析函数,但是这是次优的。有什么办法可以加快速度吗?

编辑:发送文档的示例:

<?xml version="1.0" encoding="utf-8"?>
<book>
<entry>
<id>192</id>
<title>A book &mdash Title</title>
<synopsis>A long-winded, multi-paragraph synopsis with unicode</synopsis>
</entry>
</book>

错误:[行,列]处的 XMLStreamException ParseError:[30,267]消息:引用了实体“mdash”,但未声明。 com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next (XMLStreamReaderImpl.java:598)

最佳答案

如上所述:

The delay in parsing when using the DTD is very likely to the parser actually fetching the DTD, it's best practice to save the DTD locally instead of referencing one at the w3.org's website. Saving a copy of the DTD locally and reference it with a local path will speed that part up. For the entity resolution (mdash), the entities need to be part of the DTD, see:

Replace special characters like &ndash; and &mdash; occuring in an xml document with corresponding code like &#150; etc

关于xml - 使用 dtd 时 clojure xml 解析缓慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21564652/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com