gpt4 book ai didi

java - Java String.getBytes ("UTF-8") 是否保留词典顺序?

转载 作者:搜寻专家 更新时间:2023-11-01 03:10:22 26 4
gpt4 key购买 nike

如果我有一个按字典顺序排序的 Java 字符串列表 [s1,s2,s3,s4, ...., sn],然后使用 UTF-8 将每个字符串转换为字节数组encoding bx = sx.getBytes("UTF-8"),字节数组列表[b1,b2,b3,...bn]也是按字典序排序的吗?

最佳答案

是的。根据RFC 3239 :

The byte-value lexicographic sorting order of UTF-8 strings is the same as if ordered by character numbers. Of course this is of limited interest since a sort order based on character numbers is almost never culturally valid.

正如 Ian Roberts 指出的,这适用于“true UTF-8(例如 String.getBytes 会给你)”,但要注意 DataInputStream's fake UTF-8 ,它将在 [U+000001] 之后排序 [U+000000],在 [U+10FFFF] 之后排序 [U+00F000]。

关于java - Java String.getBytes ("UTF-8") 是否保留词典顺序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11978569/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com