gpt4 book ai didi

delphi - 为什么 Delphi 中的嵌套子例程会产生性能损失?

转载 作者:行者123 更新时间:2023-12-03 14:42:48 27 4
gpt4 key购买 nike

我们使用的静态分析器has a report上面写着:

Subprograms with local subprograms (OPTI7)

This section lists subprograms that themselves have local subprograms. Especially when these subprograms share local variables, it can have a negative effect on performance.

This guide说:

Do not use nested routines Nested routines (routines within other routines; also known as "local procedures") require some special stack manipulation so that the variables of the outer routine can be seen by the inner routine. This results in a good bit of overhead. Instead of nesting, move the procedure to the unit scoping level and pass the necessary variables - if necessary by reference (use the var keyword) - or make the variable global at the unit scope.

我们有兴趣知道在验证代码时是否应该考虑此报告。 this question的答案建议人们应该分析自己的应用程序以查看是否存在任何性能差异,但对于嵌套例程和普通子例程之间的差异并没有太多说明。

嵌套例程和普通例程之间的实际区别是什么?它如何导致性能损失?

最佳答案

tl;博士

  • 嵌套子例程还有额外的push/pop
  • 打开优化可能会删除这些内容,以便嵌套子例程和普通子例程生成的代码相同
  • 内联会导致为嵌套子例程和普通子例程生成相同的代码
  • 对于参数和局部变量很少的简单例程,即使关闭优化,我们也没有发现性能差异

我编写了一个小测试来确定这一点,其中 GetRTClock 以 1ns 的精度测量当前时间:

function subprogram_main(z : Integer) : Int64;
var
n : Integer;
s : Int64;

function subprogram_aux(n, z : Integer) : Integer;
var
i : Integer;
begin
// Do some useless work on the aux program
for i := 0 to n - 1 do begin
if (i > z) then
z := z + i
else
z := z - i;
end;
Result := z;
end;
begin
s := GetRTClock;

// Do some minor work on the main program
n := z div 100 * 100 + 100;
// Call the aux program
z := subprogram_aux(n, z);

Result := GetRTClock - s;
end;

function normal_aux(n, z : Integer) : Integer;
var
i : Integer;
begin
// Do some useless work on the aux program
for i := 0 to n - 1 do begin
if (i > z) then
z := z + i
else
z := z - i;
end;
Result := z;
end;

function normal_main(z : Integer) : Int64;
var
n : Integer;
s : Int64;
begin
s := GetRTClock;

// Do some minor work on the main program
n := z div 100 * 100 + 100;
// Call the aux program
z := normal_aux(n, z);

Result := GetRTClock - s;
end;

编译为:

subprogram_main

MyFormU.pas.41: begin
005CE7D0 55 push ebp
005CE7D1 8BEC mov ebp,esp
005CE7D3 83C4E0 add esp,-$20
005CE7D6 8945FC mov [ebp-$04],eax
MyFormU.pas.42: s := GetRTClock;
...
MyFormU.pas.45: n := z div 100 * 100 + 100;
...
MyFormU.pas.47: z := subprogram_aux(n, z);
005CE7F8 55 push ebp
005CE7F9 8B55FC mov edx,[ebp-$04]
005CE7FC 8B45EC mov eax,[ebp-$14]
005CE7FF E880FFFFFF call subprogram_aux
005CE804 59 pop ecx
005CE805 8945FC mov [ebp-$04],eax
MyFormU.pas.49: Result := GetRTClock - s;
...

normal_main

MyFormU.pas.70: begin
005CE870 55 push ebp
005CE871 8BEC mov ebp,esp
005CE873 83C4E0 add esp,-$20
005CE876 8945FC mov [ebp-$04],eax
MyFormU.pas.71: s := GetRTClock;
...
MyFormU.pas.74: n := z div 100 * 100 + 100;
...
MyFormU.pas.76: z := normal_aux(n, z);
005CE898 8B55FC mov edx,[ebp-$04]
005CE89B 8B45EC mov eax,[ebp-$14]
005CE89E E881FFFFFF call normal_aux
005CE8A3 8945FC mov [ebp-$04],eax
MyFormU.pas.78: Result := GetRTClock - s;
...

subprogram_aux:

MyFormU.pas.31: begin
005CE784 55 push ebp
005CE785 8BEC mov ebp,esp
005CE787 83C4EC add esp,-$14
005CE78A 8955F8 mov [ebp-$08],edx
005CE78D 8945FC mov [ebp-$04],eax
MyFormU.pas.33: for i := 0 to n - 1 do begin
005CE790 8B45FC mov eax,[ebp-$04]
005CE793 48 dec eax
005CE794 85C0 test eax,eax
005CE796 7C29 jl $005ce7c1
005CE798 40 inc eax
005CE799 8945EC mov [ebp-$14],eax
005CE79C C745F000000000 mov [ebp-$10],$00000000
MyFormU.pas.34: if (i > z) then
005CE7A3 8B45F0 mov eax,[ebp-$10]
005CE7A6 3B45F8 cmp eax,[ebp-$08]
005CE7A9 7E08 jle $005ce7b3
MyFormU.pas.35: z := z + i
005CE7AB 8B45F0 mov eax,[ebp-$10]
005CE7AE 0145F8 add [ebp-$08],eax
005CE7B1 EB06 jmp $005ce7b9
MyFormU.pas.37: z := z - i;
005CE7B3 8B45F0 mov eax,[ebp-$10]
005CE7B6 2945F8 sub [ebp-$08],eax

normal_aux:

MyFormU.pas.55: begin
005CE824 55 push ebp
005CE825 8BEC mov ebp,esp
005CE827 83C4EC add esp,-$14
005CE82A 8955F8 mov [ebp-$08],edx
005CE82D 8945FC mov [ebp-$04],eax
MyFormU.pas.57: for i := 0 to n - 1 do begin
005CE830 8B45FC mov eax,[ebp-$04]
005CE833 48 dec eax
005CE834 85C0 test eax,eax
005CE836 7C29 jl $005ce861
005CE838 40 inc eax
005CE839 8945EC mov [ebp-$14],eax
005CE83C C745F000000000 mov [ebp-$10],$00000000
MyFormU.pas.58: if (i > z) then
005CE843 8B45F0 mov eax,[ebp-$10]
005CE846 3B45F8 cmp eax,[ebp-$08]
005CE849 7E08 jle $005ce853
MyFormU.pas.59: z := z + i
005CE84B 8B45F0 mov eax,[ebp-$10]
005CE84E 0145F8 add [ebp-$08],eax
005CE851 EB06 jmp $005ce859
MyFormU.pas.61: z := z - i;
005CE853 8B45F0 mov eax,[ebp-$10]
005CE856 2945F8 sub [ebp-$08],eax

唯一的区别是一次push和一次pop。如果我们打开优化会发生什么?

MyFormU.pas.47: z := subprogram_aux(n, z);
005CE7C5 8BD3 mov edx,ebx
005CE7C7 8BC6 mov eax,esi
005CE7C9 E8B6FFFFFF call subprogram_aux

MyFormU.pas.76: z := normal_aux(n, z);
005CE82D 8BD3 mov edx,ebx
005CE82F 8BC6 mov eax,esi
005CE831 E8B6FFFFFF call normal_aux

两者都编译为完全相同的东西。

内联时会发生什么?

MyFormU.pas.76: z := normal_aux(n, z);
005CE804 8BD3 mov edx,ebx
005CE806 8BC8 mov ecx,eax
005CE808 49 dec ecx
005CE809 85C9 test ecx,ecx
005CE80B 7C11 jl $005ce81e
005CE80D 41 inc ecx
005CE80E 33C0 xor eax,eax
005CE810 3BD0 cmp edx,eax
005CE812 7D04 jnl $005ce818
005CE814 03D0 add edx,eax
005CE816 EB02 jmp $005ce81a
005CE818 2BD0 sub edx,eax
005CE81A 40 inc eax
005CE81B 49 dec ecx
005CE81C 75F2 jnz $005ce810

subprogram_main:

MyFormU.pas.47: z := subprogram_aux(n, z);
005CE7A8 8BD3 mov edx,ebx
005CE7AA 8BC8 mov ecx,eax
005CE7AC 49 dec ecx
005CE7AD 85C9 test ecx,ecx
005CE7AF 7C11 jl $005ce7c2
005CE7B1 41 inc ecx
005CE7B2 33C0 xor eax,eax
005CE7B4 3BD0 cmp edx,eax
005CE7B6 7D04 jnl $005ce7bc
005CE7B8 03D0 add edx,eax
005CE7BA EB02 jmp $005ce7be
005CE7BC 2BD0 sub edx,eax
005CE7BE 40 inc eax
005CE7BF 49 dec ecx
005CE7C0 75F2 jnz $005ce7b4

再说一次,没有区别。

我还分析了这个小示例,每个示例(正常程序和子程序)平均执行 30 次,以随机顺序调用:

constructor TForm1.Create(AOwner: TComponent);
const
c_nSamples = 60;
rnd_sample : array[0..c_nSamples - 1] of byte = (1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0);
var
subprogram_gt_ns : Int64;
normal_gt_ns : Int64;
rnd_input : Integer;
i : Integer;
begin
inherited Create(AOwner);

normal_gt_ns := 0;
subprogram_gt_ns := 0;

rnd_input := Random(1000);

for i := 0 to c_nSamples - 1 do
if (rnd_sample[i] = 1) then
Inc(subprogram_gt_ns, subprogram_main(rnd_input))
else
Inc(normal_gt_ns, normal_main(rnd_input));

OutputDebugString(PChar(' Normal ' + FloatToStr(normal_gt_ns / 30) + ' Subprogram ' + FloatToStr(subprogram_gt_ns / 30)));
end;

即使关闭优化也没有显着差异:

Debug Output:  Normal 1166,66666666667 Subprogram 1203,33333333333 Process MyProject.exe (1824)

最后,两篇关于性能警告的文本​​都提到了有关共享局部变量的内容。

如果我们不将 z 传递给 subprogram_aux,而是直接访问它,我们会得到:

MyFormU.pas.47: z := subprogram_aux(n);
005CE7D2 55 push ebp
005CE7D3 8BC3 mov eax,ebx
005CE7D5 E8AAFFFFFF call subprogram_aux
005CE7DA 59 pop ecx
005CE7DB 8945FC mov [ebp-$04],eax

即使打开了优化。

关于delphi - 为什么 Delphi 中的嵌套子例程会产生性能损失?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55832314/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com