gpt4 book ai didi

c# - 区分忽略空格、变音符号和大小写的字符串列表

转载 作者:太空狗 更新时间:2023-10-29 23:49:37 26 4
gpt4 key购买 nike

给定以下字符串列表:

string[] Itens = new string[] { "hi", " hi   ", "HI", "hí", " Hî", "hi hi", " hí hí ", "olá", "OLÁ", " olá   ", "", "ola", "hola", " holà    ", "aaaa", "áâàa", " aâàa     ", "áaàa", "áâaa ", "aaaa ", "áâaa", "áâaa", };

Distinct 操作的结果应该是:

hi, hi hi, olá, , hola, aaaa

可用于 IEnumerable 的 C# 的 Distinct 操作接受 IEqualityComparer 作为参数,因此我们可以个性化比较。

下面的实现完成了工作

class LengthHash : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
if (x == null || y == null) return x == y;

var xt = x.Trim();
var yt = y.Trim();

return xt.Length == yt.Length && Culture.CompareInfo.IndexOf(xt, yt, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) >= 0;
}

public int GetHashCode(string obj) => obj?.Trim().Length ?? 1;
}

如果 GetHashCode 不同,Equals 甚至不会执行,因此有一个好的实现很重要。

我尝试将 GetHashCode 更改为其他 2 种不同的方法。

忽略哈希

public int GetHashCode(string obj) => 1;

标准化哈希

public int GetHashCode(string obj) => obj?.Trim().Normalize().ToUpperInvariant().GetHashCode() ?? 1;
// obs: This approach doesn't produce the same output.

除了使用个性化的 IEqualityComparer,我还尝试在执行 StringComparer.InvariantCultureIgnoreCase 之前修剪列表,但它产生与 Normalize 和 Upper 版本相同的输出。

对纯 Distinct、StringComparer.InvariantCultureIgnoreCase 和 3 种个性化方法进行基准测试会产生以下结果:

                              Method |       Mean |    StdErr |    StdDev |     Median |
------------------------------------ |----------- |---------- |---------- |----------- |
RunDefault | 2.2224 us | 0.0242 us | 0.2391 us | 2.1414 us |
RunHashAsLength | 6.0765 us | 0.0515 us | 0.1857 us | 6.1235 us |
RunIgnoreHash | 6.4078 us | 0.0640 us | 0.6140 us | 6.1982 us |
RunNormalizedHash | 14.5941 us | 0.0742 us | 0.3556 us | 14.4983 us |
RunTrimAndCompareWithStringComparer | 14.4935 us | 0.0213 us | 0.0768 us | 14.5352 us |

输出是:

21 Default: hi,  hi   , HI, hí,  Hî, hi hi,  hí hí , olá, OLÁ,  olá   , , ola, hola,  holà    , aaaa, áâàa,  aâàa     , áaàa, áâaa , aaaa , áâaa
6 HashAsLength: hi, hi hi, olá, , hola, aaaa
6 IgnoreHash: hi, hi hi, olá, , hola, aaaa
15 NormalizedHash: hi, hí, Hî, hi hi, hí hí , olá, , ola, hola, holà , aaaa, áâàa, aâàa , áaàa, áâaa
15 RunTrimAndCompareWithStringComparer: hi, hí, Hî, hi hi, hí hí, olá, , ola, hola, holà, aaaa, áâàa, aâàa, áaàa, áâaa

您可以在 https://gist.github.com/Flash3001/d50a6b43bba7bc61e3d85734e40dbed9 中找到完整的测试

问题是:是否有更好的方法来获得所需的最终列表?无论是不同的 GetHashCode、Equals 还是其他预定义的 IEqualityComparer。

最佳答案

您可以使用CompareInfo 类、CompareGetHashCode 提供的指定方法。这样您就可以确保实现是一致的。正确性是第一位的。性能是次要的。

class StringEqualityComparer : IEqualityComparer<string>
{
private CultureInfo _cultureInfo;
private CompareOptions _options;
private bool _trim;

public StringEqualityComparer(CultureInfo cultureInfo,
CompareOptions options, bool trim)
{
_cultureInfo = cultureInfo;
_options = options;
_trim = trim;
}

public bool Equals(string x, string y)
{
if (_trim) { x = x?.Trim(); y = y?.Trim(); }
return _cultureInfo.CompareInfo.Compare(x, y, _options) == 0;
}

public int GetHashCode(string obj)
{
if (_trim) obj = obj?.Trim();
return _cultureInfo.CompareInfo.GetHashCode(obj, _options);
}
}

使用示例:

var comparer = new StringEqualityComparer(CultureInfo.InvariantCulture,
CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase, true);
var items = new string[] { "hi", " hi ", "HI", "hí", " Hî", "hi hi", " hí hí ",
"olá", "OLÁ", " olá ", "", "ola", "hola", " holà ", "aaaa", "áâàa",
" aâàa ", "áaàa", "áâaa ", "aaaa ", "áâaa", "áâaa", };
Console.WriteLine($"Distinct: {String.Join(", ", items.Distinct(comparer))}");

输出:

Distinct: hi, hi hi, olá, , hola, aaaa

关于c# - 区分忽略空格、变音符号和大小写的字符串列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43106492/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com