I am asking this basic question to make the records straight. Have referred this question and its currently accepted answer, which is not convincing. However the second most voted answer gives better insight, but not perfect either.
我问这个基本问题是为了澄清事实。都提到了这个问题及其目前接受的答案,这是不令人信服的。然而,投票第二多的答案提供了更好的洞察力,但也不是完美的。
While reading below try to distinguish between the inline
keyword and “inlining” concept.
在阅读下面的内容时,请尝试区分内联关键字和“内联”概念。
Here is my take:
以下是我的观点:
The "inlining" concept
This is done to save the call overhead of a function. It's more similar to macro-style code replacement. Nothing to be disputed.
这样做是为了节省函数的调用开销。它更类似于宏样式的代码替换。没有什么好争辩的。
The inline
keyword
Perception A
The inline
keyword is a request to the compiler usually used for smaller functions, so that compiler can optimize it and make faster calls. The Compiler is free to ignore it.
I partially dispute this for below reasons:
出于以下原因,我对此提出了部分异议:
- Larger and/or recursive functions are not inlined anyways and the compiler ignores the
inline
keyword completely
- Smaller functions are automatically inlined by the optimizer irrespective of the
inline
keyword being mentioned or not.
It's quite clear that the user doesn't have any control over function inlining with the use of keyword inline
.
很明显,用户对使用关键字内联的函数内联没有任何控制。
Perception B
inline
has nothing to do with the concept of inlining. Putting inline
ahead of big / recursive functions won't help, while smaller function won't need it for being inlined.
The only deterministic use of inline
is to maintain the One
Definition Rule.
i.e. if a function is declared with inline
then only below things are mandated:
即,如果一个函数是用内联声明的,那么只有以下内容是强制的:
- Even if its body is found in multiple translation units (e.g. include that header in multiple
.cpp
files), the compiler will generate only 1 definition and avoid multiple symbol linker error. (Note: If the bodies of that function are different then it is undefined behavior.)
- The body of the
inline
function has to be visible / accessible in all the translation units who use it. In other words, declaring an inline
function in .h
and defining in any one .cpp
file will result in an “undefined symbol linker error” for other .cpp
files
Verdict
IMO, the perception “A” is entirely wrong and the perception “B” is entirely right.
国际海事组织的看法“A”是完全错误的,而看法“B”是完全正确的。
There are some quotes in standard on this, however I am expecting an answer which logically explains if this verdict correct or not.
有一些报价标准,但我期待一个答案,从逻辑上解释,如果这个判决正确与否。
Email reply from Bjarne Stroustrup:
来自Bjarne Stroustrup的电子邮件回复:
"For decades, people have promised that the compiler/optimizer is or will soon be better than humans for inlining. This may be true in theory, but it still isn't in practice for good programmers, especially in an environment where whole-program optimization is not feasible. There are major gains to be had from judicious use of explicit inlining."
更多回答
"Wrong" is a bit harsh, don't you think. the "A" perspective is the standard explanation given to C programmers learning C++. It's not wrong as such, it's just incomplete. It fails to mention how the rules change concerning multiple declarations and definitions, for example. Combine both perspectives, and you have a more complete picture
“错”有点苛刻,你不觉得吗?“A”视角是对学习C++的C程序员的标准解释。它本身并没有错,只是不完整而已。例如,它没有提到关于多个声明和定义的规则是如何变化的。将这两种视角结合起来,你就有了一个更完整的图景
With some compilers, the inline
keyword reminds the compiler to inline the function in debug mode or when optimizations are turned off. For higher levels of optimization, the compiler may inline small functions regardless of whether the inline
keyword is used or not. Remember that the inline
keyword can be used with freestanding functions also. I have this concept working on the IAR EWARM compiler.
对于某些编译器,INLINE关键字会提醒编译器在调试模式下或在关闭优化时内联函数。对于更高级别的优化,无论是否使用INLINE关键字,编译器都可以内联小函数。请记住,inline关键字也可以与独立函数一起使用。我在IAR EWARM编译器上使用了这个概念。
There's a problem with the basic premise of this question, which is that the language definition/standard - and therefore any answer based on the language-lawyering game - has absolutely no direct connection to what the compiler actually does. e.g. "Smaller functions are automatically "inlined" by optimizer" -> what optimizer? What if I'm using TCC, or -O0? What if that feature isn't finished yet on my platform? What if I'm using a pessimizing compiler (think this was discussed before here)? The language doesn't define this stuff; perception A is entirely implementation-dependent.
这个问题的基本前提有一个问题,那就是语言定义/标准--因此任何基于语言律师游戏的答案--与编译器的实际工作完全没有直接联系。例如:“较小的函数由优化器自动”内联“->什么优化器?如果我使用的是TCC或-O0怎么办?如果该功能在我的平台上还没有完成怎么办?如果我使用的是令人悲观的编译器(我想这在前面已经讨论过了),该怎么办?语言没有定义这些东西;感知A完全依赖于实现。
@MooingDuck: there is a question; it's just not obviously marked with a question mark. The very last sentence says he wants to know "if this verdict is true or false." It's a roundabout way of asking "I've drawn this conclusion; is it correct?"
@MooingDuck:有一个问题,只是没有明显的问号。最后一句话说他想知道“这个判决是真是假。”这是一种迂回的方式,问“我已经得出了这个结论;它是正确的吗?”
I wasn't sure about your claim:
我对你的说法不太确定:
Smaller functions are automatically "inlined" by optimizer irrespective of inline is mentioned or not...
It's quite clear that the user doesn't have any control over function "inlining" with the use of keyword inline
.
I've heard that compilers are free to ignore your inline
request, but I didn't think they disregarded it completely.
我听说编译器可以自由地忽略您的内联请求,但我不认为他们会完全忽略它。
I looked through the Github repository for Clang and LLVM to find out. (Thanks, open source software!) I found out that The inline
keyword does make Clang/LLVM more likely to inline a function.
我在Github存储库中查看了Clang和LLVM,以找出答案。(谢谢,开源软件!)我发现inline关键字确实使Clang/LLVM更有可能内联一个函数。
The Search
Searching for the word inline
in the Clang repository leads to the token specifier kw_inline
. It looks like Clang uses a clever macro-based system to build the lexer and other keyword-related functions, so there's noting direct like if (tokenString == "inline") return kw_inline
to be found. But Here in ParseDecl.cpp, we see that kw_inline
results in a call to DeclSpec::setFunctionSpecInline()
.
在Clang存储库中搜索单词inline会得到令牌说明符kw_inline。看起来Clang使用了一个聪明的基于宏的系统来构建词法分析器和其他与关键字相关的函数,所以没有像if(tokenString==“inline”)这样的直接返回kw_inline。但是在ParseDecl.cpp中,我们可以看到kw_inline导致了对DeclSpec::setFunctionspecInline()的调用。
case tok::kw_inline:
isInvalid = DS.setFunctionSpecInline(Loc, PrevSpec, DiagID);
break;
Inside that function, we set a bit and emit a warning if it's a duplicate inline
:
在该函数内部,我们设置一个位,如果它是重复的内联,则发出警告:
if (FS_inline_specified) {
DiagID = diag::warn_duplicate_declspec;
PrevSpec = "inline";
return true;
}
FS_inline_specified = true;
FS_inlineLoc = Loc;
return false;
Searching for FS_inline_specified
elsewhere, we see it's a single bit in a bitfield, and it's used in a getter function, isInlineSpecified()
:
搜索在其他地方指定的FS_INLINE_,我们看到它是位字段中的一个位,并且它用在一个getter函数isInlineSpecified()中:
bool isInlineSpecified() const {
return FS_inline_specified | FS_forceinline_specified;
}
Searching for call sites of isInlineSpecified()
, we find the codegen, where we convert the C++ parse tree into LLVM intermediate representation:
搜索isInlineSpecified()的调用点,我们找到了codegen,在其中我们将C++解析树转换为LLVM中间表示:
if (!CGM.getCodeGenOpts().NoInline) {
for (auto RI : FD->redecls())
if (RI->isInlineSpecified()) {
Fn->addFnAttr(llvm::Attribute::InlineHint);
break;
}
} else if (!FD->hasAttr<AlwaysInlineAttr>())
Fn->addFnAttr(llvm::Attribute::NoInline);
Clang to LLVM
We are done with the C++ parsing stage. Now our inline
specifier is converted to an attribute of the language-neutral LLVM Function
object. We switch from Clang to the LLVM repository.
我们已经完成了C++解析阶段。现在,我们的内联说明符被转换为语言中立的LLVM函数对象的属性。我们从Clang切换到LLVM存储库。
Searching for llvm::Attribute::InlineHint
yields the method Inliner::getInlineThreshold(CallSite CS)
(with a scary-looking braceless if
block):
搜索llvm::Attribute::InlineHint会得到方法inliner::getInlineThreshold(CallSite CS)(带有一个看起来很可怕的无括号IF块):
// Listen to the inlinehint attribute when it would increase the threshold
// and the caller does not need to minimize its size.
Function *Callee = CS.getCalledFunction();
bool InlineHint = Callee && !Callee->isDeclaration() &&
Callee->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
Attribute::InlineHint);
if (InlineHint && HintThreshold > thres
&& !Caller->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
Attribute::MinSize))
thres = HintThreshold;
So we already have a baseline inlining threshold from the optimization level and other factors, but if it's lower than the global HintThreshold
, we bump it up. (HintThreshold is settable from the command line.)
因此,我们已经有了来自优化级别和其他因素的基线内联阈值,但如果它低于全局HintThreshold,我们就会提高它。(可以从命令行设置HintThreshold。)
getInlineThreshold()
appears to have only one call site, a member of SimpleInliner
:
GetInlineThreshold()似乎只有一个调用点,即SimpleInliner的成员:
InlineCost getInlineCost(CallSite CS) override {
return ICA->getInlineCost(CS, getInlineThreshold(CS));
}
It calls a virtual method, also named getInlineCost
, on its member pointer to an instance of InlineCostAnalysis
.
它在指向InlineCostAnalysis实例的成员指针上调用也名为getInlineCost的虚方法。
Searching for ::getInlineCost()
to find the versions that are class members, we find one that's a member of AlwaysInline
- which is a non-standard but widely supported compiler feature - and another that's a member of InlineCostAnalysis
. It uses its Threshold
parameter here:
搜索::getInlineCost()要查找属于类成员的版本,我们会发现一个版本是AlwaysInline的成员(这是一个非标准但受到广泛支持的编译器功能),另一个版本是InlineCostAnalysis的成员。它在此处使用其阈值参数:
CallAnalyzer CA(Callee->getDataLayout(), *TTI, AT, *Callee, Threshold);
bool ShouldInline = CA.analyzeCall(CS);
CallAnalyzer::analyzeCall()
is over 200 lines and does the real nitty gritty work of deciding if the function is inlineable. It weighs many factors, but as we read through the method we see that all its computations either manipulate the Threshold
or the Cost
. And at the end:
CallAnalyzer::analyzeCall()长达200多行,并执行确定函数是否可内联的实际工作。它权衡了许多因素,但当我们通读该方法时,我们看到它的所有计算要么操纵了阈值,要么操纵了成本。最后是:
return Cost < Threshold;
But the return value named ShouldInline
is really a misnomer. In fact the main purpose of analyzeCall()
is to set the Cost
and Threshold
member variables on the CallAnalyzer
object. The return value only indicates the case when some other factor has overridden the cost-vs-threshold analysis, as we see here:
但是名为ShouldInline的返回值实际上是一个用词不当的词。事实上,analyzeCall()的主要目的是在CallAnalyzer对象上设置成本和阈值成员变量。返回值仅表示其他因素覆盖成本与阈值分析时的情况,如下所示:
// Check if there was a reason to force inlining or no inlining.
if (!ShouldInline && CA.getCost() < CA.getThreshold())
return InlineCost::getNever();
if (ShouldInline && CA.getCost() >= CA.getThreshold())
return InlineCost::getAlways();
Otherwise, we return an object that stores the Cost
and Threshold
.
否则,我们返回一个存储成本和阈值的对象。
return llvm::InlineCost::get(CA.getCost(), CA.getThreshold());
So we're not returning a yes-or-no decision in most cases. The search continues! Where is this return value of getInlineCost()
used?
所以在大多数情况下,我们不会返回一个是或否的决定。搜索继续!getInlineCost()的返回值在哪里使用?
The Real Decision
It's found in bool Inliner::shouldInline(CallSite CS)
. Another big function. It calls getInlineCost()
right at the beginning.
它可以在bool inliner::shoudInline(CallSite CS)中找到。另一个重要功能。它在开始时调用getInlineCost()。
It turns out that getInlineCost
analyzes the intrinsic cost of inlining the function - its argument signature, code length, recursion, branching, linkage, etc. - and some aggregate information about every place the function is used. On the other hand, shouldInline()
combines this information with more data about a specific place where the function is used.
结果是,getInlineCost分析了内联函数的内在成本--它的参数签名、代码长度、递归、分支、链接等--以及有关函数使用位置的一些聚合信息。另一方面,shresdInline()将此信息与有关使用该函数的特定位置的更多数据结合在一起。
Throughout the method there are calls to InlineCost::costDelta()
- which will use the InlineCost
s Threshold
value as computed by analyzeCall()
. Finally, we return a bool
. The decision is made. In Inliner::runOnSCC()
:
在整个方法中,都有对InlineCost::ost Delta()的调用,它将使用analyzeCall()计算出的InlineCosts阈值。最后,我们返回一个bool。决定已经做出了。在inliner::runOnSCC()中:
if (!shouldInline(CS)) {
emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() +
" will not be inlined into " +
Caller->getName()));
continue;
}
// Attempt to inline the function.
if (!InlineCallIfPossible(CS, InlineInfo, InlinedArrayAllocas,
InlineHistoryID, InsertLifetime, DL)) {
emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() +
" will not be inlined into " +
Caller->getName()));
continue;
}
++NumInlined;
InlineCallIfPossible()
does the inlining based on shouldInline()
's decision.
InlineCallIfPosable()根据ShresdInline()的S决定执行内联。
So the Threshold
was affected by the inline
keyword, and is used in the end to decide whether to inline.
因此Threshold受inline关键字的影响,并在最后决定是否内联。
Therefore, your Perception B is partly wrong because at least one major compiler changes its optimization behavior based on the inline
keyword.
因此,您的看法B在一定程度上是错误的,因为至少有一个主要编译器基于inline关键字更改了其优化行为。
However, we can also see that inline
is only a hint, and other factors may outweigh it.
不过,我们也可以看到,内联只是一个暗示,其他因素可能会盖过它。
Both are correct.
两者都是正确的。
The use of inline
might, or might not, influence the compiler's decision to inline any particular call to the function. So A is correct - it acts as a non-binding request that calls to the function be inlined, which the compiler is free to ignore.
内联的使用可能会或可能不会影响编译器对函数的任何特定调用进行内联的决定。因此A是正确的-它作为一个非绑定请求,调用内联函数,编译器可以自由地忽略这一点。
The semantic effect of inline
is to relax the restrictions of the One Definition Rule to allow identical definitions in multiple translation units, as described in B. For many compilers, this is necessary to allow the inlining of function calls - the definition must be available at that point, and compilers are only required to process one translation unit at a time.
内联的语义效果是放宽一个定义规则的限制,以允许在多个翻译单元中使用相同的定义,如B中所述。对于许多编译器来说,这是允许内联函数调用所必需的--此时定义必须可用,并且编译器一次只需要处理一个翻译单元。
I just wanted to post a proof-of-concept example of inline
affecting inlining.
我只是想发布一个内联影响内联的概念验证示例。
Here it is:
它是这样的:
namespace {
struct S {
auto foo(unsigned int x) {
++x; x *= x;
++x; x *= x;
++x; x *= x;
++x; x *= x;
++x; x *= x;
++x; x *= x;
++x; x *= x;
++x; x *= x;
return x;
}
inline auto bar(unsigned int x) {
++x; x *= foo(x);
++x; x *= foo(x);
++x; x *= foo(x);
++x; x *= foo(x);
++x; x *= foo(x);
return x;
}
auto baz(unsigned int x) {
++x; x *= bar(x);
++x; x *= bar(x);
++x; x *= bar(x);
++x; x *= bar(x);
return x;
}
};
}
int main(int argc, char *argv[]) {
return S().baz(argc);
}
Even with -O3
, Clang 16.0.0 doesn't inline bar
; you'll see call
instructions in the output in that case. But if you add inline
to bar
, then these get fully inlined, and the call
instructions go away. This is despite the fact that all of the methods are semantically implicitly inline!
即使使用-o3,Clang 16.0.0也不内联BAR;在这种情况下,您将在输出中看到调用指令。但是,如果您将内联添加到bar中,则这些内容将完全内联,调用指令也会消失。尽管所有方法在语义上都是隐式内联的!
The example is obviously contrived, but I've come across real-world situations where it's resulted in a nontrivial performance difference.
这个例子显然是人为设计的,但我在现实世界中遇到过这样的情况,在这些情况下,它会导致很大的性能差异。
更多回答
Comment from Bjarne Stroustrup: "For decades, people have promised that the compiler/optimizer is or will soon be better than humans for inlining. This may be true in theory, but it still isn't in practice for good programmers, especially in an environment where whole-program optimization is not feasible. There are major gains to be had from judicious use of explicit inlining."
来自Bjarne Stroustrup的评论:“几十年来,人们承诺编译器/优化器在内联方面比人类更好。这在理论上可能是正确的,但对于优秀的程序员来说,这仍然是不现实的,特别是在一个不可能进行整个程序优化的环境中。明智地使用显式内联可以获得重大好处。”
Yes. Always check the assembler output for performance-critical code. The compiler usually does the right thing, but not always. GCC and Clang have __attribute__(always_inline)
and MSVC has __forceinline
but even those can fail beccause some functions are not inlineable.
是的始终检查汇编程序输出中的性能关键代码。编译器通常会做正确的事情,但并不总是如此。GCC和Clang有__attribute__(always_inline),MSVC有__forceinline,但即使是这些也会失败,因为有些函数是不可内联的。
Thanks for the bounty! It was fun to learn more about LLVM internals.
谢谢你的赏金!了解有关LLVM内部结构的更多信息非常有趣。
@iammilind: I'm not sure how a "static" compiler could ever expect to know better than humans how to optimize things unless code is annotated to indicate how often various things are going to happen. If a change would make a program 10% faster when processing some files and 50% slower with others, a compiler can't possibly be expected to know whether that change would be good or bad without knowing which kind of files the program will spend more time crunching. A dynamic compiler (JIT) might be able to use execution patterns to make such determinations on the fly, but...
@iammilind:我不确定“静态”编译器怎么可能比人类更了解如何优化东西,除非代码被注释以指示各种事情发生的频率。如果一项更改会使程序在处理某些文件时快10%,在处理其他文件时慢50%,那么在不知道程序将花更多时间处理哪种文件的情况下,编译器不可能知道该更改是好是坏。动态编译器(JIT)也许能够使用执行模式动态地做出这样的决定,但是...
...a static compiler would have no such ability. Static compilers can often outperform dynamic ones in cases where they can be steered into optimizing for the proper cases, but that process generally requires knowledge of a future compilers can't possibly predict.
...静态编译器不会有这样的能力。在可以引导静态编译器针对适当的情况进行优化的情况下,静态编译器的性能通常会优于动态编译器,但这个过程通常需要了解未来编译器无法预测的知识。
Of course, there is LTO / whole-program-optimization which does not rely on having the definition in the same TU as the use.
当然,还有LTO/全程序优化,它不依赖于在与使用相同的TU中具有定义。
@Deduplicator: Indeed, that's why I qualified it with "for many compilers". I thought about adding a brief description of other schemes, but that's rather beyond the scope of a simple question.
@Deduicator:的确,这就是为什么我用“许多编译器”来限定它的原因。我想添加对其他方案的简短描述,但这超出了一个简单问题的范围。
我是一名优秀的程序员,十分优秀!