gpt4 book ai didi

c# - FirstUnmatchedIndex 使用 CurrentCultureIgnoreCase

转载 作者:太空狗 更新时间:2023-10-29 23:49:06 25 4
gpt4 key购买 nike

我需要支持可以在输入文本中使用非 ascii 字母的语言,因此我需要为 FirstUnmatchedIndex 实现 StringComparison.CurrentCultureIgnoreCase。忽略大小写并不是那么糟糕,但我不知道如何将组合符号转换为标准表示然后进行比较。所以这里有一些情况,函数应该返回 -1 但返回其他东西......

encyclopædia = encyclopaedia
Archæology = Archaeology
ARCHÆOLOGY = archaeology
Archæology = archaeology
Weißbier = WEISSBIER

如何知道一个字符是否需要扩展并在需要时将每个字符转换为扩展形式?

/// <summary>
/// Gets a first different char occurence index
/// </summary>
/// <param name="a">First string</param>
/// <param name="b">Second string</param>
/// <param name="compareSmallest">
/// If true, Returns the first difference found or -1 if the end of a string is reached without finding a difference.
/// IE, Return -1 if the smallest string is contained in the other.
/// Otherwise returns -1 only if both string are really the same and will return the position where the smallest string ends if no difference is found.
/// </param>
/// <returns>
/// Returns first difference index or -1 if no difference is found
/// </returns>
public static int FirstUnmatchedIndex(this string a, string b, bool compareSmallest = false, StringComparison comparisonType = StringComparison.CurrentCulture)
{
//Treat null as empty
if (String.IsNullOrEmpty(a)) {
if (String.IsNullOrEmpty(b)) {
//Equal, both empty.
return -1;
} else {
//If compareSmallest, empty is always found in longest.
//Otherwise, difference at pos 0.
return compareSmallest ? -1 : 0;
}
}
if (object.ReferenceEquals(a, b)) {
//Same Ref.
return -1;
}

//Convert strings before compare.
switch (comparisonType) {
case StringComparison.CurrentCulture:
//FIXME
break;
case StringComparison.CurrentCultureIgnoreCase:
//FIXME
var currentCulture = System.Globalization.CultureInfo.CurrentCulture;
a = a.ToLower(currentCulture);
b = b.ToLower(currentCulture);
break;
case StringComparison.InvariantCulture:
//FIXME
break;
case StringComparison.InvariantCultureIgnoreCase:
//FIXME
a = a.ToLowerInvariant();
b = b.ToLowerInvariant();
break;
case StringComparison.OrdinalIgnoreCase:
a = a.ToLower();
b = b.ToLower();
break;
case StringComparison.Ordinal:
//Ordinal(Binary) comprare, nothing special to do.
default:
break;
}

string longStr = a.Length > b.Length ? a : b;
string shortStr = a.Length > b.Length ? b : a;

int count = shortStr.Length;
for (int idx = 0; idx < count; idx++) {
//FIXME Check if char needs to be expanded ?
if (shortStr[idx] != longStr[idx]) {
return idx;
}
}
return compareSmallest || longStr.Length == count ? -1 : count;
}

最佳答案

我不确定我是否理解正确你的问题,但你可以使用“字典+正则表达式”组合。这个想法是用你想要扩展的字符创建字典,并在正则表达式的帮助下找到它们。以下代码显示了如何执行此操作的示例。

正则表达式解释:

  • (?i) - 这会启用不区分大小写的搜索(与RegexOptions.IgnoreCase,但内联)
  • [^\p{IsBasicLatin}]+ -这将搜索所有不适合基本拉丁语的字符字符集(从 \u0000\u007F)。

该代码使用ToLower 方法避免将大写非拉丁字符添加到字典中。当然,如果您想要明确(即将所有小写和大写字符添加到字典中并删除 ToLower),您当然可以不这样做。

var dic = new Dictionary<string, string>
{
["æ"] = "ae",
["ß"] = "ss"
};

var words = new[] { "encyclopædia", "Archæology", "ARCHÆOLOGY", "Archæology", "Weißbier" };
var pattern = @"(?i)[^\p{IsBasicLatin}]+";

int x = -1;
foreach(var word in words)
{
// Each match (m.Value) is passed to dictionary
words[++x] = Regex.Replace(word, pattern, m => dic[m.Value.ToLower()]);
}
words.ToList().ForEach(WriteLine);

/*
Output:
encyclopaedia
Archaeology
ARCHaeOLOGY
Archaeology
Weissbier
*/

关于c# - FirstUnmatchedIndex 使用 CurrentCultureIgnoreCase,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53969220/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com