gpt4 book ai didi

Java CharsetDecoder在每个字符后插入空格

转载 作者:塔克拉玛干 更新时间:2023-11-02 20:04:02 26 4
gpt4 key购买 nike

我正在尝试使用此代码(在 Stackoverflow 上找到)删除无效的 UTF-8 字符:

def text = file.text
CharsetDecoder utf8Decoder = Charset.forName("UTF-8").newDecoder();
utf8Decoder.onMalformedInput(CodingErrorAction.IGNORE);
utf8Decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
ByteBuffer bytes = ByteBuffer.allocate(text.getBytes().length * 2)
CharBuffer cbuf = bytes.asCharBuffer()
cbuf.put(text)
cbuf.flip()
CharBuffer parsed = utf8Decoder.decode(bytes);
println parsed.toString()

我得到的输出是这样的:

 < d o c u m e n t >
< t i t l e > S o me T i t l e < / t i t l e >
< s i t e > A S i t e < / s i t e >

对于为什么会这样表现有什么想法吗?

最佳答案

不知道为什么这不起作用,但这就是修复它的方法(代码在 Groovy 中,而不是 Java):

file.withInputStream { stream ->
CharsetDecoder utf8Decoder = Charset.forName("UTF-8").newDecoder();
utf8Decoder.onMalformedInput(CodingErrorAction.IGNORE);
utf8Decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
def reader = new BufferedReader(new InputStreamReader(stream, utf8Decoder))
def line = null

def sb = new StringBuilder()
while ( (line = reader.readLine()) != null) {
sb.append("$line\n")
}
reader.close()
}

关于Java CharsetDecoder在每个字符后插入空格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24418175/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com