gpt4 book ai didi

c# - Func 的性能和继承

转载 作者:可可西里 更新时间:2023-11-01 09:08:12 25 4
gpt4 key购买 nike

我一直无法理解使用 Func<...> 的性能特征在使用继承和泛型时贯穿我的代码——这是我发现自己一直在使用的组合。

让我从一个最小的测试用例开始,以便我们都知道我们在谈论什么,然后我将发布结果,然后我将解释我的期望以及原因...

最小测试用例

public class GenericsTest2 : GenericsTest<int> 
{
static void Main(string[] args)
{
GenericsTest2 at = new GenericsTest2();

at.test(at.func);
at.test(at.Check);
at.test(at.func2);
at.test(at.Check2);
at.test((a) => a.Equals(default(int)));
Console.ReadLine();
}

public GenericsTest2()
{
func = func2 = (a) => Check(a);
}

protected Func<int, bool> func2;

public bool Check2(int value)
{
return value.Equals(default(int));
}

public void test(Func<int, bool> func)
{
using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
{
for (int i = 0; i < 100000000; ++i)
{
func(i);
}
}
}
}

public class GenericsTest<T>
{
public bool Check(T value)
{
return value.Equals(default(T));
}

protected Func<T, bool> func;
}

public class Stopwatch : IDisposable
{
public Stopwatch(Action<TimeSpan> act)
{
this.act = act;
this.start = DateTime.UtcNow;
}

private Action<TimeSpan> act;
private DateTime start;

public void Dispose()
{
act(DateTime.UtcNow.Subtract(start));
}
}

结果

Took 2.50s  -> at.test(at.func);
Took 1.97s -> at.test(at.Check);
Took 2.48s -> at.test(at.func2);
Took 0.72s -> at.test(at.Check2);
Took 0.81s -> at.test((a) => a.Equals(default(int)));

我的期望和原因

我希望这段代码对于所有 5 种方法都以完全相同的速度运行,更准确地说,甚至比任何一种方法都更快,即与以下速度一样快:

using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
{
for (int i = 0; i < 100000000; ++i)
{
bool b = i.Equals(default(int));
}
}
// this takes 0.32s ?!?

我预计它需要 0.32 秒,因为我看不出 JIT 编译器有任何理由不在这种特殊情况下内联代码。

仔细检查后,我根本不理解这些性能数字:

  • at.func传递给函数并且在执行期间不能更改。为什么这不是内联的?
  • at.Check显然比 at.Check2 快, 而两者都不能被覆盖,并且 at.Check 的 IL 在类 GenericsTest2 的情况下是固定不变的
  • 我认为没有理由 Func<int, bool>通过内联时变慢 Func而不是转换为 Func 的方法
  • 为什么测试用例 2 和 3 之间的差异高达 0.5 秒,而案例 4 和 5 之间的差异是 0.1 秒 - 他们不应该是一样的吗?

问题

我真的很想了解这一点……这里发生了什么,使用通用基类比内联整个类慢 10 倍之多?

所以,基本上问题是:为什么会发生这种情况,我该如何解决?

更新

根据目前的所有评论(谢谢!)我做了更多的挖掘工作。

首先,在重复测试并将循环扩大 5 倍并执行 4 次时得到一组新结果。我使用了诊断秒表并添加了更多测试(也添加了描述)。

(Baseline implementation took 2.61s)

--- Run 0 ---
Took 3.00s for (a) => at.Check2(a)
Took 12.04s for Check3<int>
Took 12.51s for (a) => GenericsTest2.Check(a)
Took 13.74s for at.func
Took 16.07s for GenericsTest2.Check
Took 12.99s for at.func2
Took 1.47s for at.Check2
Took 2.31s for (a) => a.Equals(default(int))
--- Run 1 ---
Took 3.18s for (a) => at.Check2(a)
Took 13.29s for Check3<int>
Took 14.10s for (a) => GenericsTest2.Check(a)
Took 13.54s for at.func
Took 13.48s for GenericsTest2.Check
Took 13.89s for at.func2
Took 1.94s for at.Check2
Took 2.61s for (a) => a.Equals(default(int))
--- Run 2 ---
Took 3.18s for (a) => at.Check2(a)
Took 12.91s for Check3<int>
Took 15.20s for (a) => GenericsTest2.Check(a)
Took 12.90s for at.func
Took 13.79s for GenericsTest2.Check
Took 14.52s for at.func2
Took 2.02s for at.Check2
Took 2.67s for (a) => a.Equals(default(int))
--- Run 3 ---
Took 3.17s for (a) => at.Check2(a)
Took 12.69s for Check3<int>
Took 13.58s for (a) => GenericsTest2.Check(a)
Took 14.27s for at.func
Took 12.82s for GenericsTest2.Check
Took 14.03s for at.func2
Took 1.32s for at.Check2
Took 1.70s for (a) => a.Equals(default(int))

我从这些结果中注意到,当您开始使用泛型时,它会变得更慢。深入了解我为非泛型实现找到的 IL:

L_0000: ldarga.s 'value'
L_0002: ldc.i4.0
L_0003: call instance bool [mscorlib]System.Int32::Equals(int32)
L_0008: ret

对于所有的通用实现:

L_0000: ldarga.s 'value'
L_0002: ldloca.s CS$0$0000
L_0004: initobj !T
L_000a: ldloc.0
L_000b: box !T
L_0010: constrained. !T
L_0016: callvirt instance bool [mscorlib]System.Object::Equals(object)
L_001b: ret

虽然其中大部分都可以优化,但我认为 callvirt这里可能是个问题。

为了让它更快,我在方法的定义中添加了“T : IEquatable”约束。结果是:

L_0011: callvirt instance bool [mscorlib]System.IEquatable`1<!T>::Equals(!0)

虽然我现在对性能有了更多了解(它可能无法内联,因为它创建了一个 vtable 查找),但我仍然感到困惑:为什么它不简单地调用 T::Equals?毕竟,我确实指定它会在那里...

最佳答案

始终运行微基准测试 3 次。第一个将触发 JIT 并将其排除。检查第 2 次和第 3 次运行是否相等。这给出:

... run ...
Took 0.79s
Took 0.63s
Took 0.74s
Took 0.24s
Took 0.32s
... run ...
Took 0.73s
Took 0.63s
Took 0.73s
Took 0.24s
Took 0.33s
... run ...
Took 0.74s
Took 0.63s
Took 0.74s
Took 0.25s
Took 0.33s

线

func = func2 = (a) => Check(a);

添加一个额外的函数调用。删除它

func = func2 = this.Check;

给出:

... 1. run ...
Took 0.64s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s
... 2. run ...
Took 0.63s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s
... 3. run ...
Took 0.63s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s

这表明 1. 和 2. run 之间的(JIT?)效果由于删除了函数调用而消失了。 前 3 个测试现在相等

在测试 4 和 5 中,编译器可以将函数参数内联到 void test(Func<>),而在测试 1 到 3 中,编译器需要很长的路才能弄清楚它们是常量。有时,从我们的编码人员的角度来看,编译器存在一些不容易看到的约束,例如 .Net 和 Jit 约束来自 .Net 程序的动态特性,与由 C++ 生成的二进制文件相比。无论如何,函数 arg 的内联使这里有所不同。

4 和 5 之间的区别?好吧,test5 看起来编译器也可以很容易地内联函数。也许他为闭包构建了一个上下文并解决了比需要更复杂的问题。没有深入研究 MSIL 来弄清楚。

上面使用 .Net 4.5 进行的测试。这里使用 3.5,证明编译器通过内联变得更好:

... 1. run ...
Took 1.06s
Took 1.06s
Took 1.06s
Took 0.24s
Took 0.27s
... 2. run ...
Took 1.06s
Took 1.08s
Took 1.06s
Took 0.25s
Took 0.27s
... 3. run ...
Took 1.05s
Took 1.06s
Took 1.05s
Took 0.24s
Took 0.27s

和.Net 4:

... 1. run ...
Took 0.97s
Took 0.97s
Took 0.96s
Took 0.22s
Took 0.30s
... 2. run ...
Took 0.96s
Took 0.96s
Took 0.96s
Took 0.22s
Took 0.30s
... 3. run ...
Took 0.97s
Took 0.96s
Took 0.96s
Took 0.22s
Took 0.30s

现在将 GenericTest<> 更改为 GenericTest!!

... 1. run ...
Took 0.28s
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.27s
... 2. run ...
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.27s
... 3. run ...
Took 0.25s
Took 0.25s
Took 0.25s
Took 0.24s
Took 0.27s

好吧,这是来自 C# 编译器的一个惊喜,类似于我遇到的密封类以避免虚函数调用的情况。也许 Eric Lippert 对此有话要说?

移除对聚合的继承可以恢复性能。我学会了从不使用继承,很少使用继承,并且强烈建议您至少在这种情况下避免使用它。 (这是我对这个问题的务实解决方案,无意进行争吵)。我一直严格使用接口(interface),它们没有性能损失。

关于c# - Func<T> 的性能和继承,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15669358/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com