gpt4 book ai didi

.net - 了解特定的 CIL/CLR 优化

转载 作者:行者123 更新时间:2023-12-03 16:53:53 25 4
gpt4 key购买 nike

编辑:我在最后添加了 ASM。

我相信学习如何在平台上编写好的代码的最好方法是试验平台,从而理解它。因此,这个问题是为了更好地理解 CLR,而不是尝试纳米优化。

尽管如此,我还是想到融合设置和评估变量这两个操作会更快。事实证明,确实如此。在下面的代码中,第二个循环的执行时间大约是第一个循环的 60%:

private sealed class Temp
{
public int val;
}

private void button13_Click(object sender, EventArgs e)
{
Temp t = new Temp();
Temp t1;

int T1 = Environment.TickCount;

for (int i = 0; i < 1000000000; i++)
{
t1 = t;

if (t1.val++ == 1000)
{
t1.val = 0;
}
}

int T2 = Environment.TickCount;

for (int i = 0; i < 1000000000; i++)
{
if ((t1 = t).val++ == 1000)
{
t1.val = 0;
}
}

int T3 = Environment.TickCount;

MessageBox.Show((T2 - T1).ToString() + Environment.NewLine +
(T3 - T2).ToString() + Environment.NewLine +
t.val.ToString());
}

在大多数情况下,CIL 编译器会在堆栈上创 build 置值的副本,这意味着不需要通常需要的存储和获取。这将解释明显显着的速度增加。

但是,这段特定代码的反编译 C# 和 IL 并没有这样做,而是增加了开销。然而,它几乎快了一倍。

EDIT2:我在物理上切换了循环,发现第二个循环总是快两倍。为什么?所以我添加了一个“热身”循环,这导致第一个循环的速度大约是原来的两倍。它基本上是相同的代码(ASM 方面)。幕后发生了什么?
{
Temp t1;
Temp t = new Temp();
int T1 = Environment.TickCount;
for (int i = 0; i < 0x3b9aca00; i++)
{
t1 = t;
if (t1.val++ == 0x3e8)
{
t1.val = 0;
}
}
int T2 = Environment.TickCount;
for (int i = 0; i < 0x3b9aca00; i++)
{
Temp temp1 = t1 = t;
if (temp1.val++ == 0x3e8)
{
t1.val = 0;
}
}
int T3 = Environment.TickCount;
string[] CS$0$0002 = new string[] { (T2 - T1).ToString(), Environment.NewLine, (T3 - T2).ToString(), Environment.NewLine, t.val.ToString() };
MessageBox.Show(string.Concat(CS$0$0002));
}

编辑:在 64 位 .Net 4 Release模式下编译
L_0000: newobj instance void DIRECT_UI.Form1/Temp::.ctor()
L_0005: stloc.0
L_0006: call int32 [mscorlib]System.Environment::get_TickCount()
L_000b: stloc.2
L_000c: ldc.i4.0
L_000d: stloc.3
L_000e: br.s L_0037
L_0010: ldloc.0
L_0011: stloc.1
L_0012: ldloc.1
L_0013: dup
L_0014: ldfld int32 DIRECT_UI.Form1/Temp::val
L_0019: dup
L_001a: stloc.s CS$0$0000
L_001c: ldc.i4.1
L_001d: add
L_001e: stfld int32 DIRECT_UI.Form1/Temp::val
L_0023: ldloc.s CS$0$0000
L_0025: ldc.i4 0x3e8
L_002a: bne.un.s L_0033
L_002c: ldloc.1
L_002d: ldc.i4.0
L_002e: stfld int32 DIRECT_UI.Form1/Temp::val
L_0033: ldloc.3
L_0034: ldc.i4.1
L_0035: add
L_0036: stloc.3
L_0037: ldloc.3
L_0038: ldc.i4 0x3b9aca00
L_003d: blt.s L_0010
L_003f: call int32 [mscorlib]System.Environment::get_TickCount()
L_0044: stloc.s T2
L_0046: ldc.i4.0
L_0047: stloc.s V_5
L_0049: br.s L_0074
L_004b: ldloc.0
L_004c: dup
L_004d: stloc.1
L_004e: dup
L_004f: ldfld int32 DIRECT_UI.Form1/Temp::val
L_0054: dup
L_0055: stloc.s CS$0$0001
L_0057: ldc.i4.1
L_0058: add
L_0059: stfld int32 DIRECT_UI.Form1/Temp::val
L_005e: ldloc.s CS$0$0001
L_0060: ldc.i4 0x3e8
L_0065: bne.un.s L_006e
L_0067: ldloc.1
L_0068: ldc.i4.0
L_0069: stfld int32 DIRECT_UI.Form1/Temp::val
L_006e: ldloc.s V_5
L_0070: ldc.i4.1
L_0071: add
L_0072: stloc.s V_5
L_0074: ldloc.s V_5
L_0076: ldc.i4 0x3b9aca00
L_007b: blt.s L_004b
L_007d: call int32 [mscorlib]System.Environment::get_TickCount()
L_0082: stloc.s T3
L_0084: ldc.i4.5
L_0085: newarr string
L_008a: stloc.s CS$0$0002
L_008c: ldloc.s CS$0$0002
L_008e: ldc.i4.0
L_008f: ldloc.s T2
L_0091: ldloc.2
L_0092: sub
L_0093: stloc.s CS$0$0003
L_0095: ldloca.s CS$0$0003
L_0097: call instance string [mscorlib]System.Int32::ToString()
L_009c: stelem.ref
L_009d: ldloc.s CS$0$0002
L_009f: ldc.i4.1
L_00a0: call string [mscorlib]System.Environment::get_NewLine()
L_00a5: stelem.ref
L_00a6: ldloc.s CS$0$0002
L_00a8: ldc.i4.2
L_00a9: ldloc.s T3
L_00ab: ldloc.s T2
L_00ad: sub
L_00ae: stloc.s CS$0$0004
L_00b0: ldloca.s CS$0$0004
L_00b2: call instance string [mscorlib]System.Int32::ToString()
L_00b7: stelem.ref
L_00b8: ldloc.s CS$0$0002
L_00ba: ldc.i4.3
L_00bb: call string [mscorlib]System.Environment::get_NewLine()
L_00c0: stelem.ref
L_00c1: ldloc.s CS$0$0002
L_00c3: ldc.i4.4
L_00c4: ldloc.0
L_00c5: ldflda int32 DIRECT_UI.Form1/Temp::val
L_00ca: call instance string [mscorlib]System.Int32::ToString()
L_00cf: stelem.ref
L_00d0: ldloc.s CS$0$0002
L_00d2: call string [mscorlib]System.String::Concat(string[])
L_00d7: call valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(string)
L_00dc: pop
L_00dd: ret

这对我来说没有意义。它看起来像反向优化,但运行速度更快。任何人都可以对此有所了解吗?

ASM:
                t1 = t;
000000ac mov rax,qword ptr [rsp+20h]
000000b1 mov qword ptr [rsp+28h],rax

if (t1.val++ == 1000)
000000b6 mov rax,qword ptr [rsp+28h]
000000bb mov eax,dword ptr [rax+8]
000000be mov dword ptr [rsp+74h],eax
000000c2 mov eax,dword ptr [rsp+74h]
000000c6 mov dword ptr [rsp+44h],eax
000000ca mov ecx,dword ptr [rsp+74h]
000000ce inc ecx
000000d0 mov rax,qword ptr [rsp+28h]
000000d5 mov dword ptr [rax+8],ecx
000000d8 cmp dword ptr [rsp+44h],3E8h
000000e0 jne 00000000000000EE
if ((t1 = t).val++ == 1000)
0000011d mov rax,qword ptr [rsp+20h]
00000122 mov qword ptr [rsp+28h],rax
00000127 mov rax,qword ptr [rsp+20h]
0000012c mov eax,dword ptr [rax+8]
0000012f mov dword ptr [rsp+7Ch],eax
00000133 mov eax,dword ptr [rsp+7Ch]
00000137 mov dword ptr [rsp+48h],eax
0000013b mov ecx,dword ptr [rsp+7Ch]
0000013f inc ecx
00000141 mov rax,qword ptr [rsp+20h]
00000146 mov dword ptr [rax+8],ecx
00000149 cmp dword ptr [rsp+48h],3E8h
00000151 jne 000000000000015F

最佳答案

生成的 IL 对代码效率只有间接影响。工具 + 选项,调试,常规,取消勾选“在模块加载时抑制 JIT 优化”选项。即使在调试程序时,这也会启用 JIT 优化器。确保您选择了发布配置。

在 button13_Click 上设置断点。运行并单击按钮。右键单击源代码编辑器窗口并选择“Go To Assembly”。

请注意两个循环如何生成完全相同的机器代码。对于 x86 和 x64 抖动。这当然应该是这样,执行相同逻辑操作的代码应该产生相同的机器代码。一切都很好。

这并不一定意味着它将以完全相同的速度运行,尽管它经常这样做。代码对齐至关重要。

关于.net - 了解特定的 CIL/CLR 优化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9019991/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com