gpt4 book ai didi

javascript - localeCompare 为不同的 unicode 符号返回 0

转载 作者:行者123 更新时间:2023-12-01 15:37:06 30 4
gpt4 key购买 nike

我希望使用 localeCompare用于严格排序字符串,但我发现它正在返回 0当给定两个不同的 unicode 字符时,错误地表明它们是相同的,例如
ℜ U+211C (alt-08476) 黑色大写字母 R = 实部
ℝ U+211D (alt-08477) DOUBLE-STRUCK CAPITAL R = 实数集

"ℜ".localeCompare("ℝ", "en")   
> 0

"ℜ" === "ℝ"
> false

"ℜ".charCodeAt(0)
> 8476

"ℝ".charCodeAt(0)
> 8477
我查看了文档,但默认值已经用于“排序”和“变体”,这似乎是最严格的可用:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Collator/Collator
localeCompare无法给出严格的命令?

最佳答案

似乎在检测到它们都是大写字母R的非ASCII版本后,String.localeCompare()正确指定两个字符之间的顺序没有特别的区别。

console.log(
// two non-0x43 uppercase Cs
'ℂ'.localeCompare('𝑪', 'en'),

// two non-0x5A uppercase Zs
"ℤ".localeCompare('𝗭', 'en'),

// 0x5A ASCII Z precedes both:
"Z".localeCompare('ℤ', 'en'),
"Z".localeCompare('𝗭', 'en'),
);

您可以在由于规范等价而没有定义排序顺序的地方使用 unicode 位置:

const sort = (a, b) => a.localeCompare(b) || -(a < b);

console.log(
// 1 (C < 𝑪 in localeCompare)
sort('𝑪', 'C'),
// -1 (Canonically equivalent; falls back to 0x2102 < 0xD835)
sort('ℂ', '𝑪')
);

来自 ECMAScript spec :

The actual return values are implementation-defined to permitimplementers to encode additional information in the value, but thefunction is required to define a total ordering on all Strings and toreturn 0 when comparing Strings that are considered canonicallyequivalent by the Unicode standard.


来自关于 Unicode 等价的维基百科文章:

Unicode provides two such notions, canonical equivalence andcompatibility. Code point sequences that are defined as canonicallyequivalent are assumed to have the same appearance and meaning whenprinted or displayed.

For example, the code point U+006E (the Latinlowercase n) followed by U+0303 (the combining tilde ◌̃) isdefined by Unicode to be canonically equivalent to the single codepoint U+00F1 (the lowercase letter ñ of the Spanish alphabet).

Therefore, those sequences should be displayed in the same manner,should be treated in the same way by applications such asalphabetizing names or searching, and may be substituted for eachother. Similarly, each Hangul syllable block that is encoded as asingle character may be equivalently encoded as a combination of aleading conjoining jamo, a vowel conjoining jamo, and, if appropriate,a trailing conjoining jamo.


Unicode equivalence example.
另见: https://unicode.org/reports/tr10/

关于javascript - localeCompare 为不同的 unicode 符号返回 0,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63024515/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com