gpt4 book ai didi

c# - 使用区分文化的比较从字符串中获取子字符串

转载 作者:太空狗 更新时间:2023-10-29 21:53:20 25 4
gpt4 key购买 nike

有没有一种方法可以使用文化敏感的相等比较从字符串中获取匹配的子字符串?例如,在 en-US 文化下,æae 被认为是相等的。 "Encyclopædia".IndexOf("aed") 的计算结果为 8,表示匹配;但是,有没有一种方法可以提取匹配的子字符串 æd,而不涉及迭代源字符串?请注意,查找的和匹配的子字符串的长度可能相差几个字符。

最佳答案

我最终解决了这个问题,首先调用 IndexOf 来获取匹配项的起始位置,然后反复尝试确定它的长度。我针对与指定子串长度相同的匹配热路径进行了优化;在这种情况下,只执行一次比较。

public static class StringExtensions
{
public static void Find(this string source, string substring, StringComparison comparisonType, out int matchIndex, out int matchLength)
{
Find(source, substring, 0, source.Length, comparisonType, out matchIndex, out matchLength);
}

public static void Find(this string source, string substring, int searchIndex, StringComparison comparisonType, out int matchIndex, out int matchLength)
{
Find(source, substring, searchIndex, source.Length - searchIndex, comparisonType, out matchIndex, out matchLength);
}

public static void Find(this string source, string substring, int searchIndex, int searchLength, StringComparison comparisonType, out int matchIndex, out int matchLength)
{
matchIndex = source.IndexOf(substring, searchIndex, searchLength, comparisonType);
if (matchIndex == -1)
{
matchLength = -1;
return;
}

matchLength = FindMatchLength(source, substring, searchIndex, searchLength, comparisonType, matchIndex);

// Defensive programming, but should never happen
if (matchLength == -1)
matchIndex = -1;
}

private static int FindMatchLength(string source, string substring, int searchIndex, int searchLength, StringComparison comparisonType, int matchIndex)
{
int matchLengthMaximum = searchLength - (matchIndex - searchIndex);
int matchLengthInitial = Math.Min(substring.Length, matchLengthMaximum);

// Hot path: match length is same as substring length.
if (Compare(source, matchIndex, matchLengthInitial, substring, 0, substring.Length, comparisonType) == 0)
return matchLengthInitial;

int matchLengthDecrementing = matchLengthInitial - 1;
int matchLengthIncrementing = matchLengthInitial + 1;

while (matchLengthDecrementing >= 0 || matchLengthIncrementing <= matchLengthMaximum)
{
if (matchLengthDecrementing >= 0)
{
if (Compare(source, matchIndex, matchLengthDecrementing, substring, 0, substring.Length, comparisonType) == 0)
return matchLengthDecrementing;

matchLengthDecrementing--;
}

if (matchLengthIncrementing <= matchLengthMaximum)
{
if (Compare(source, matchIndex, matchLengthIncrementing, substring, 0, substring.Length, comparisonType) == 0)
return matchLengthIncrementing;

matchLengthIncrementing++;
}
}

// Should never happen
return -1;
}

private static int Compare(string strA, int indexA, int lengthA, string strB, int indexB, int lengthB, StringComparison comparisonType)
{
switch (comparisonType)
{
case StringComparison.CurrentCulture:
return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.None);

case StringComparison.CurrentCultureIgnoreCase:
return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.IgnoreCase);

case StringComparison.InvariantCulture:
return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.None);

case StringComparison.InvariantCultureIgnoreCase:
return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.IgnoreCase);

case StringComparison.Ordinal:
return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.Ordinal);

case StringComparison.OrdinalIgnoreCase:
return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.OrdinalIgnoreCase);

default:
throw new ArgumentException("The string comparison type passed in is currently not supported.", nameof(comparisonType));
}
}
}

示例使用:

int index, length;
source.Find(remove, StringComparison.CurrentCulture, out index, out length);
string clean = index < 0 ? source : source.Remove(index, length);

关于c# - 使用区分文化的比较从字符串中获取子字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35485677/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com