gpt4 book ai didi

.net - 标准化字符串与 ToCharArray 不同

转载 作者:行者123 更新时间:2023-12-04 04:38:29 24 4
gpt4 key购买 nike

s2 是标准化的 s1
因为字符串 s1 和 s2 看起来相同
s1 和 s2 有不同的 GetHashCode
String.Compare 将 s1 和 s2 显示为等效

s2 作为字符串具有重音
s2.ToCharArray 去除重音

为什么 s2.ToCharArray 作为字符串与 s2 不同?

我想通了。
s2 的长度为 4。
重音只是作为一个单独的字符(Int16 = 769)剥离出来。
String.Compare 足够聪明了。

有趣的是 String.Compare 会计算出来但 String.Contains 没有。

string s1 = "xxé";
string s1copy = "xxé";
string s2 = s1.Normalize(NormalizationForm.FormD);
string s2b = "xxe";
char accent = 'é';

Debug.WriteLine(s1); // xxé
Debug.WriteLine(s2); // xxé
Debug.WriteLine(s2b); // xxe

Debug.WriteLine(s1.GetHashCode()); // 424384421
Debug.WriteLine(s1copy.GetHashCode()); // 424384421
Debug.WriteLine(s2.GetHashCode()); // 1057341801
Debug.WriteLine(s2b.GetHashCode()); // 1701495145

Debug.WriteLine(s1.Contains(accent)); // true
Debug.WriteLine(s2.Contains(accent)); // false
Debug.WriteLine(s2b.Contains(accent)); // false

Debug.WriteLine(string.Compare(s1, s1copy).ToString()); // 0
Debug.WriteLine(string.Compare(s1, s2).ToString()); // 0
Debug.WriteLine(string.Compare(s1, s2b).ToString()); // 1
Debug.WriteLine(string.Compare(s2, s2b).ToString()); // 1

Debug.WriteLine(s1.Equals(s1copy)); // true
Debug.WriteLine(s1.Equals(s2)); // false
Debug.WriteLine(s1.Equals(s2b)); // false
Debug.WriteLine(s2.Equals(s2b)); // false

Debug.WriteLine(s1 == s1copy); // true
Debug.WriteLine(s1 == s2); // false
Debug.WriteLine(s1 == s2b); // false
Debug.WriteLine(s2 == s2b); // false

char[] chars1 = s1.ToCharArray();
char[] chars2 = s2.ToCharArray();
char[] chars2b = s2b.ToCharArray();
Debug.WriteLine(chars1.Length.ToString()); // 3
Debug.WriteLine(chars2.Length.ToString()); // 4
Debug.WriteLine(chars2b.Length.ToString()); // 3
Debug.WriteLine(chars1[0].ToString() + " " + ((Int16)chars1[0]).ToString() + " " + chars1[1].ToString() + " " + ((Int16)chars1[1]).ToString() + " " + chars1[2].ToString() + " " + ((Int16)chars1[2]).ToString());
// x 120 x 120 é 233
Debug.WriteLine(chars2[0].ToString() + " " + ((Int16)chars2[0]).ToString() + " " + chars2[1].ToString() + " " + ((Int16)chars2[1]).ToString() + " " + chars2[2].ToString() + " " + ((Int16)chars2[2]).ToString() +" " + chars2[3].ToString() + " " + ((Int16)chars2[3]).ToString());
//x 120 x 120 e 101 ́ 769
Debug.WriteLine(chars2b[0].ToString() + " " + ((Int16)chars2b[0]).ToString() + " " + chars2b[1].ToString() + " " + ((Int16)chars2b[1]).ToString() + " " + chars2b[2].ToString() + " " + ((Int16)chars2b[2]).ToString());
//x 120 x 120 e 101
Debug.WriteLine(chars1.GetHashCode()); // 16098066
Debug.WriteLine(chars2.GetHashCode()); // 53324351
Debug.WriteLine(chars2b.GetHashCode()); // 50785559
Debug.WriteLine(chars1 == chars2); // false
Debug.WriteLine(chars1 == chars2b); // false
Debug.WriteLine(chars2 == chars2b); // false

最佳答案

Why is s2.ToCharArray different from s2 as a string?



这是因为 NormalizationForm你选择了。它会分解 xxé x , x , e , 和 `

NormalizationForm.FormD :

Indicates that a Unicode string is normalized using full canonical decomposition.



如果这还不清楚,这里是 Unicode Composition 的定义

In the context of Unicode, character composition is the process of replacing the code points of a base letter followed by one or more combining characters into a single precomposed character; and character decomposition is the opposite process.



本质上,您将字符串分解为其最低形式,即您看到的四个不同的字符。

如果你尝试重新组合 char[] 可能会更清楚
var s2Compare = new string(chars2)
var isEq = (s2Compare == s2) //true

关于.net - 标准化字符串与 ToCharArray 不同,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19300155/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com