gpt4 book ai didi

c++ - 为什么链接器优化如此糟糕?

转载 作者:IT老高 更新时间:2023-10-28 22:17:52 26 4
gpt4 key购买 nike

最近,一位同事向我指出,将所有内容编译到单个文件中创建的代码比编译单独的目标文件更有效 - 即使打开了链接时间优化。此外,该项目的总编译时间显着下降。鉴于使用 C++ 的主要原因之一是代码效率,这让我感到惊讶。

很明显,当归档器/链接器从目标文件中创建一个库,或者将它们链接到一个可执行文件中时,即使是简单的优化也会受到惩罚。在下面的示例中,当由链接器而不是编译器完成时,微不足道的内联会降低 1.8% 的性能。似乎编译器技术应该足够先进以处理这种相当常见的情况,但它并没有发生。

这是一个使用 Visual Studio 2008 的简单示例:

#include <cstdlib>
#include <iostream>
#include <boost/timer.hpp>

using namespace std;

int foo(int x);
int foo2(int x) { return x++; }

int main(int argc, char** argv)
{
boost::timer t;

t.restart();
for (int i=0; i<atoi(argv[1]); i++)
foo (i);
cout << "time : " << t.elapsed() << endl;

t.restart();
for (int i=0; i<atoi(argv[1]); i++)
foo2 (i);
cout << "time : " << t.elapsed() << endl;
}

foo.cpp

int foo (int x) { return x++; }

运行结果:使用链接 foo 而不是内联 foo2 会降低 1.8% 的性能。

$ ./release/testlink.exe  100000000
time : 13.375
time : 13.14

是的,链接器优化标志 (/LTCG) 已打开。

最佳答案

你的同事已经过时了。该技术自 2003 年以来一直存在(在 MS C++ 编译器上):/LTCG .链接时间代码生成正是处理这个问题。据我所知GCC为下一代编译器提供此功能。

LTCG 不仅优化跨模块的内联函数等代码,而且实际上重新排列代码以优化缓存局部性和针对特定负载的分支,请参阅 Profile-Guided Optimizations .这些选项通常只为发布版本保留,因为构建可能需要数小时才能完成:将链接已检测的可执行文件,运行分析加载,然后再次与分析结果链接。该链接包含有关 LTCG 究竟优化了什么的详细信息:

Inlining – For example, if there exists a function A that frequently calls function B, and function B is relatively small, then profile-guided optimizations will inline function B in function A.

Virtual Call Speculation – If a virtual call, or other call through a function pointer, frequently targets a certain function, a profile-guided optimization can insert a conditionally-executed direct call to the frequently-targeted function, and the direct call can be inlined.

Register Allocation – Optimizing with profile data results in better register allocation.

Basic Block Optimization – Basic block optimization allows commonly executed basic blocks that temporally execute within a given frame to be placed in the same set of pages (locality). This minimizes the number of pages used, thus minimizing memory overhead.

Size/Speed Optimization – Functions where the program spends a lot of time can be optimized for speed.

Function Layout – Based on the call graph and profiled caller/callee behavior, functions that tend to be along the same execution path are placed in the same section.

Conditional Branch Optimization – With the value probes, profile-guided optimizations can find if a given value in a switch statement is used more often than other values. This value can then be pulled out of the switch statement. The same can be done with if/else instructions where the optimizer can order the if/else so that either the if or else block is placed first depending on which block is more frequently true.

Dead Code Separation – Code that is not called during profiling is moved to a special section that is appended to the end of the set of sections. This effectively keeps this section out of the often-used pages.

EH Code Separation – The EH code, being exceptionally executed, can often be moved to a separate section when profile-guided optimizations can determine that the exceptions occur only on exceptional conditions.

Memory Intrinsics – The expansion of intrinsics can be decided better if it can be determined if an intrinsic is called frequently. An intrinsic can also be optimized based on the block size of moves or copies.

关于c++ - 为什么链接器优化如此糟糕?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1401342/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com