gpt4 book ai didi

c++ - 为什么内联汇编程序在此代码中有时更快而有时更慢?每次运行的执行时间变化很大

转载 作者:行者123 更新时间:2023-11-28 04:23:39 25 4
gpt4 key购买 nike

我已经编写了一些 C++ 代码来测试 C++ 和内联汇编代码的时间。最初我只是玩得开心,但后来我注意到每次运行我的程序时,我都会得到不同的结果。有时 C++ 更快,有时内联汇编代码更快,有时它们都一样。

这是怎么回事?

这是带有程序输出的代码:

#define TRIALS 1000000
#include <iostream>
using namespace std;
typedef std::chrono::high_resolution_clock Clock;
int main()
{
auto t1 = Clock::now();
auto t2 = Clock::now();
int X3=17;
int X2=17;
int X4=17;
int X=17;



int sum=0;
int avg=0;
cout << "=================================" << endl;
cout << "| var*=10; |" << endl;
cout << "=================================" << endl;

for( int i=0; i<TRIALS; i++ )
{
X3=17;
t1 = Clock::now();
X3*=10;
t2 = Clock::now();
sum+=chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
}
avg=sum/TRIALS;
cout << "| Product: " << X3<< " "<< avg << " nanoseconds |" << endl;
cout << "=================================" << endl;
cout << endl << endl;

avg=sum=0;
cout << "=================================" << endl;
cout << "| use inline assembler with shl |" << endl;
cout << "=================================" << endl;

for( int i=0; i<TRIALS; i++ )
{
X=17;
t1 = Clock::now();
asm /*volatile*/ (
"movl %0, %%eax;" // X->ax
"shll %%eax;"// ax*=2
"movl %%eax, %%ebx;" // ax->bx
"shll %%eax;" // ax*=2
"shll %%eax;" // ax*=2
"add %%ebx, %%eax;" // bx+ax->ax
: "=a" (X)
: "a" (X)
: "%ebx"
);
t2 = Clock::now();
sum+=chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
}
avg=sum/TRIALS;
cout << "| Product: " << X << " "<< avg << " nanoseconds |" << endl;
cout << "=================================" << endl;
cout << endl << endl;
avg=sum=0;

cout << "=================================" << endl;
cout << "| var=var*10 |" << endl;
cout << "=================================" << endl;

for( int i=0; i<TRIALS; i++ )
{
X2=17;
t1 = Clock::now();
X2=X2*10;
t2 = Clock::now();
sum+=chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
}
avg=sum/TRIALS;
cout << "| Product: " << X3<< " "<< avg << " nanoseconds |" << endl;
cout << "=================================" << endl;
cout << endl << endl;

avg=sum=0;


cout << "=================================" << endl;
cout << "| use inline assembler with mul |" << endl;
cout << "=================================" << endl;
for( int i=0; i<TRIALS; i++ )
{
X4=17;
t1 = Clock::now();
asm (
"movl %0, %%eax;" // X->ax
"movl $0x0A, %%ebx;" // 10->bx
"mul %%ebx;" // 10*ax->ax
: "=a" (X4)
: "a" (X4)
: "%ebx"
);
t2 = Clock::now();
sum+=chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
}
avg=sum/TRIALS;
cout << "| Product: " << X4<< " "<< avg << " nanoseconds |" << endl;
cout << "=================================" << endl;
cout << endl;

return(0);
}

程序输出#1:

=================================
| var*=10; |
=================================
| Product: 170 50 nanoseconds |
=================================


=================================
| use inline assembler with shl |
=================================
| Product: 170 50 nanoseconds |
=================================


=================================
| var=var*10 |
=================================
| Product: 170 50 nanoseconds |
=================================


=================================
| use inline assembler with mul |
=================================
| Product: 170 50 nanoseconds |
=================================

输出#2:

=================================
| var*=10; |
=================================
| Product: 170 62 nanoseconds |
=================================


=================================
| use inline assembler with shl |
=================================
| Product: 170 57 nanoseconds |
=================================


=================================
| var=var*10 |
=================================
| Product: 170 59 nanoseconds |
=================================


=================================
| use inline assembler with mul |
=================================
| Product: 170 58 nanoseconds |
=================================

最佳答案

这些更像是提示,而不仅仅是“解决方案”:

1) 将 TRAILS 提高几个数量级以实际测量秒范围内的东西

2) 重复测量几次(n=100 或更多)并取平均值(如果您关心统计数据,则平均值误差 = rms/sqrt(n))

3) 实际测量你想测量的东西:至少只把你感兴趣的代码放到 TRAILS 循环中,即:

t1 = Clock::now();  
for( int i=0; i<TRIALS; i++ )
{
... only code relevant for your calculation here ...
}
t2 = Clock::now();
sum = chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();

4) 最后,考虑 Godbolt 编译器资源管理器服务 https://godbolt.org/您可以在其中检查代码的汇编器输出以了解各种优化器设置。对于像你的代码一样简单的代码(我试过)它只是这样做(使用 -O3):mov eax,170 所以你看:编译器很聪明,你不能轻易用内联汇编器打败他!这肯定是非平凡示例的情况。

关于c++ - 为什么内联汇编程序在此代码中有时更快而有时更慢?每次运行的执行时间变化很大,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54931366/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com