gpt4 book ai didi

Java:MalformedByteSequenceException (XML)

转载 作者:行者123 更新时间:2023-12-01 16:09:07 26 4
gpt4 key购买 nike

我正在尝试使用此 class 解析 XML 。当我输入一个简单的文件时,它工作得很好。

<testData>
<text>
odp
</text>
</testData>

这是我的主要

public static void main(String[] args) { 
Xml train = new Xml(args[0], "trainingData");
Xml test = new Xml(args[1], "testData");
}

但是,当我使用从 MSFT Office OneNote 复制粘贴得到的文件时,出现错误:

Exception in thread "main" java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
at odp.compling.Xml.rootElement(Xml.java:41)
at odp.compling.Xml.<init>(Xml.java:61)
at odp.compling.ParseTreeAnalysis2.main(ParseTreeAnalysis2.java:10)
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at odp.compling.Xml.rootElement(Xml.java:33)
... 2 more

这是什么原因造成的?我在 Notepad++ 中编辑了有问题的 XML 文件,并将编码更改为 UTF-8。这导致重音符号/特殊引号中出现一堆奇怪的字符,我将其编辑掉。难道是我转换不当?

(我对文本编码格式一无所知,以防你不知道。)

最佳答案

您的文件未正确编码为 UTF-8,但您的解析器需要 UTF-8 编码。

您可以发布文件的十六进制转储,这将有助于查明问题。

关于Java:MalformedByteSequenceException (XML),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1871340/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com