gpt4 book ai didi

java - 按代码点读取文本流代码点

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:19:51 24 4
gpt4 key购买 nike

我正在尝试从 Java 文本文件中读取 Unicode 代码点。 InputStreamReader 类通过 int 返回流的内容 int,我希望它能做我想做的事,但它不构成代理项对。

我的测试程序:

import java.io.*;
import java.nio.charset.*;

class TestChars {
public static void main(String args[]) {
InputStreamReader reader =
new InputStreamReader(System.in, StandardCharsets.UTF_8);
try {
System.out.print("> ");
int code = reader.read();
while (code != -1) {
String s =
String.format("Code %x is `%s', %s.",
code,
Character.getName(code),
new String(Character.toChars(code)));
System.out.println(s);
code = reader.read();
}
} catch (Exception e) {
}
}
}

其行为如下:

$ java TestChars 
> keyboard ⌨. pizza 🍕
Code 6b is `LATIN SMALL LETTER K', k.
Code 65 is `LATIN SMALL LETTER E', e.
Code 79 is `LATIN SMALL LETTER Y', y.
Code 62 is `LATIN SMALL LETTER B', b.
Code 6f is `LATIN SMALL LETTER O', o.
Code 61 is `LATIN SMALL LETTER A', a.
Code 72 is `LATIN SMALL LETTER R', r.
Code 64 is `LATIN SMALL LETTER D', d.
Code 20 is `SPACE', .
Code 2328 is `KEYBOARD', ⌨.
Code 2e is `FULL STOP', ..
Code 20 is `SPACE', .
Code 70 is `LATIN SMALL LETTER P', p.
Code 69 is `LATIN SMALL LETTER I', i.
Code 7a is `LATIN SMALL LETTER Z', z.
Code 7a is `LATIN SMALL LETTER Z', z.
Code 61 is `LATIN SMALL LETTER A', a.
Code 20 is `SPACE', .
Code d83c is `HIGH SURROGATES D83C', ?.
Code df55 is `LOW SURROGATES DF55', ?.
Code a is `LINE FEED (LF)',
.

我的问题是构成披萨表情符号的代理对是单独读取的。我想将符号读入单个 int 并完成它。

问题:是否有一个 reader(类)类可以在阅读时自动将代理对组合成字符? (并且,如果输入格式不正确,大概会抛出异常。)

我知道我可以自己组合这些对,但我宁愿避免重新发明轮子。

最佳答案

如果您利用 String 具有返回代码点流的方法,则您不必自己处理代理项对:

import java.io.*;

class cptest {
public static void main(String[] args) {
try (BufferedReader br =
new BufferedReader(new InputStreamReader(System.in, "UTF-8"))) {
br.lines().flatMapToInt(String::codePoints).forEach(cptest::print);
} catch (Exception e) {
System.err.println("Error: " + e);
}
}
private static void print(int cp) {
String s = new String(Character.toChars(cp));
System.out.println("Character " + cp + ": " + s);
}
}

会产生

$ java cptest <<< "keyboard ⌨. pizza 🍕"
Character 107: k
Character 101: e
Character 121: y
Character 98: b
Character 111: o
Character 97: a
Character 114: r
Character 100: d
Character 32:
Character 9000: ⌨
Character 46: .
Character 32:
Character 112: p
Character 105: i
Character 122: z
Character 122: z
Character 97: a
Character 32:
Character 127829: 🍕

关于java - 按代码点读取文本流代码点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53270963/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com