gpt4 book ai didi

java - "Invalid byte 1 of 1-byte UTF-8 sequence"读取 RSS 源时

转载 作者:行者123 更新时间:2023-11-30 04:43:55 26 4
gpt4 key购买 nike

我的代码非常简单:

DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = db.parse("http://blog.rogermontgomery.com/feed/?cat=skaffold");

问题是我以异常结束:

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.arrangeCapacity(XMLEntityScanner.java:1619)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipString(XMLEntityScanner.java:1657)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:193)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:772)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:232)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
at com.skaffold.service.RogerBlogReader.read(RogerBlogReader.java:33)
[...]

我不明白,xml header 将文档声明为UTF-8,http响应以UTF-8编码...有什么解释吗?

最佳答案

并非所有字节序列都是有效的 UTF-8。 UTF-8 解码器可以读取单个字节,并根据字节值知道它在 UTF-8 中是非法的。听起来您的 RSS 提要很糟糕,可能声称是 UTF-8,但实际上编码不同,例如 iso8859-1。

更新:提要 URL 经过 gzip 压缩。您尝试过解压吗?

关于java - "Invalid byte 1 of 1-byte UTF-8 sequence"读取 RSS 源时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11571457/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com