gpt4 book ai didi

c++ - 时间:2019-05-10标签:c++ : Compilation of huge amount expressions failed with -O2 optimization?

转载 作者:行者123 更新时间:2023-12-02 11:00:17 30 4
gpt4 key购买 nike

我正在使用Eigen库进行一些矩阵计算。我必须定义一个大矩阵(实际上不是那么大,只有300x300),每个元素都由长的复杂指数表达式组成。

为了给我一个印象,我复制了矩阵定义的一小部分

#include <iostream>
#include <complex>
#include <Eigen/Dense>
using namespace Eigen;

int main()
{
typedef std::complex<double> cd;
MatrixXcd h(300,300);
double kx,ky;
kx=1.;
ky=1.;
h.setZero(300,300);
h(0,0)=cd(6.942755,0.) + 0.043986/exp(cd(0,1)*(0. - 2.0238820899708214*kx - 7.55323078829979*ky)) - 0.010802/exp(cd(0,1)*(0. + 5.529348698328969*kx - 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) + 0.043986/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky)) - 0.010802/exp(cd(0,1)*(0. - 5.529348698328969*kx + 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. + 2.0238820899708214*kx + 7.55323078829979*ky));
h(0,2)=cd(0.,0.) + 0.095916/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) - 0.131689/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky));
h(0,3)=cd(-0.10825,0.) - 0.011519/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky));
...
...//6000 more lines omitted here
}

我在Windows上使用mingw-w64,编译器设置良好。但是当我用
g++ -O2 code.cpp

编译失败,并 pop 对话框!

enter image description here

如果仔细看任务管理器,编译将停止在大约1GB的内存使用上。

但是,如果我再次使用 -O0选项编译代码,即禁用所有优化,则即使内存使用量达到接近2GB的峰值,编译仍成功。 因此,确定的失败不是内存引起的。

此外, 我可以确认此行为与Eigen库无关。 即使我不使用 Eigen并将所有赋值替换为同一变量,像这样
#include <iostream>
#include <complex>

int main()
{
typedef std::complex<double> cd;
cd tmp;
double kx,ky;
kx=1.;
ky=1.;
tmp=cd(6.942755,0.) + 0.043986/exp(cd(0,1)*(0. - 2.0238820899708214*kx - 7.55323078829979*ky)) - 0.010802/exp(cd(0,1)*(0. + 5.529348698328969*kx - 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) + 0.043986/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky)) - 0.010802/exp(cd(0,1)*(0. - 5.529348698328969*kx + 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. + 2.0238820899708214*kx + 7.55323078829979*ky));
tmp=cd(0.,0.) + 0.095916/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) - 0.131689/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky));
tmp=cd(-0.10825,0.) - 0.011519/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky));
... //6000 more lines omitted
}

对于 -O2选项,编译也将失败。

另外, 的问题不仅限于mingw编译器。我还尝试了英特尔并行工作室icl.exe。情况甚至更糟,编译花费了30多分钟,而且似乎会不断进行,而且我没有耐心等待完成,而且最终可能还会失败。

所以我的问题是什么原因导致-O2编译失败?如何使-O2适用于我的代码(其中包含大量表达式)?同样令我惊讶的是,尽管有很多表达式,但是它们只是由基本的exp函数组成,为什么编译需要那么多时间和内存?有什么技巧可以使编译更快?

更新

根据Marc Glisse的建议,我运行以下命令。 -O1有效,但是我想要的至少是O2,因为该代码用于科学计算。速度很重要。
R:\>g++ -O1  -ftime-report  eigen.cpp

Execution times (seconds)
phase setup : 0.01 ( 0%) usr 1540 kB ( 0%) ggc
phase parsing : 6.06 ( 5%) usr 412774 kB (25%) ggc
phase lang. deferred : 0.18 ( 0%) usr 6491 kB ( 0%) ggc
phase opt and generate : 122.65 (95%) usr 1203926 kB (74%) ggc
|name lookup : 0.61 ( 0%) usr 39968 kB ( 2%) ggc
|overload resolution : 2.18 ( 2%) usr 151685 kB ( 9%) ggc
garbage collection : 1.48 ( 1%) usr 0 kB ( 0%) ggc
callgraph construction : 0.65 ( 1%) usr 28545 kB ( 2%) ggc
callgraph optimization : 0.41 ( 0%) usr 6 kB ( 0%) ggc
ipa dead code removal : 0.02 ( 0%) usr 0 kB ( 0%) ggc
ipa inlining heuristics : 0.58 ( 0%) usr 6172 kB ( 0%) ggc
ipa reference : 0.02 ( 0%) usr 0 kB ( 0%) ggc
ipa profile : 0.11 ( 0%) usr 0 kB ( 0%) ggc
ipa pure const : 0.20 ( 0%) usr 0 kB ( 0%) ggc
cfg cleanup : 0.04 ( 0%) usr 0 kB ( 0%) ggc
trivially dead code : 0.05 ( 0%) usr 0 kB ( 0%) ggc
df scan insns : 0.09 ( 0%) usr 0 kB ( 0%) ggc
df multiple defs : 0.03 ( 0%) usr 0 kB ( 0%) ggc
df live regs : 0.13 ( 0%) usr 0 kB ( 0%) ggc
df live&initialized regs: 0.04 ( 0%) usr 0 kB ( 0%) ggc
df reg dead/unused notes: 0.17 ( 0%) usr 2440 kB ( 0%) ggc
register information : 0.01 ( 0%) usr 0 kB ( 0%) ggc
alias analysis : 0.05 ( 0%) usr 1546 kB ( 0%) ggc
alias stmt walking : 27.43 (21%) usr 19006 kB ( 1%) ggc
rebuild jump labels : 0.03 ( 0%) usr 0 kB ( 0%) ggc
preprocessing : 0.63 ( 0%) usr 8732 kB ( 1%) ggc
parser (global) : 0.30 ( 0%) usr 80513 kB ( 5%) ggc
parser struct body : 0.36 ( 0%) usr 20184 kB ( 1%) ggc
parser enumerator list : 0.03 ( 0%) usr 1004 kB ( 0%) ggc
parser function body : 3.52 ( 3%) usr 253532 kB (16%) ggc
parser inl. func. body : 0.16 ( 0%) usr 6243 kB ( 0%) ggc
parser inl. meth. body : 0.24 ( 0%) usr 12261 kB ( 1%) ggc
template instantiation : 0.75 ( 1%) usr 36791 kB ( 2%) ggc
early inlining heuristics: 0.74 ( 1%) usr 78738 kB ( 5%) ggc
inline parameters : 0.60 ( 0%) usr 3273 kB ( 0%) ggc
integration : 34.96 (27%) usr 421223 kB (26%) ggc
tree gimplify : 0.93 ( 1%) usr 78917 kB ( 5%) ggc
tree eh : 1.81 ( 1%) usr 147729 kB ( 9%) ggc
tree CFG construction : 0.26 ( 0%) usr 47487 kB ( 3%) ggc
tree CFG cleanup : 0.92 ( 1%) usr 0 kB ( 0%) ggc
tree copy propagation : 0.03 ( 0%) usr 0 kB ( 0%) ggc
tree PTA : 1.80 ( 1%) usr 167 kB ( 0%) ggc
tree PHI insertion : 0.07 ( 0%) usr 519 kB ( 0%) ggc
tree SSA rewrite : 1.63 ( 1%) usr 97983 kB ( 6%) ggc
tree SSA other : 0.13 ( 0%) usr 17 kB ( 0%) ggc
tree SSA incremental : 28.75 (22%) usr 5 kB ( 0%) ggc
tree operand scan : 2.13 ( 2%) usr 65917 kB ( 4%) ggc
dominator optimization : 0.08 ( 0%) usr 2043 kB ( 0%) ggc
tree SRA : 2.65 ( 2%) usr 56210 kB ( 3%) ggc
tree CCP : 2.42 ( 2%) usr 37765 kB ( 2%) ggc
tree split crit edges : 0.11 ( 0%) usr 2953 kB ( 0%) ggc
tree reassociation : 0.04 ( 0%) usr 0 kB ( 0%) ggc
tree FRE : 3.35 ( 3%) usr 35524 kB ( 2%) ggc
tree code sinking : 0.01 ( 0%) usr 0 kB ( 0%) ggc
tree linearize phis : 0.01 ( 0%) usr 6 kB ( 0%) ggc
tree backward propagate : 0.02 ( 0%) usr 0 kB ( 0%) ggc
tree forward propagate : 0.38 ( 0%) usr 8 kB ( 0%) ggc
tree conservative DCE : 0.13 ( 0%) usr 1 kB ( 0%) ggc
tree aggressive DCE : 0.33 ( 0%) usr 2 kB ( 0%) ggc
tree DSE : 0.45 ( 0%) usr 4 kB ( 0%) ggc
tree SSA uncprop : 0.01 ( 0%) usr 0 kB ( 0%) ggc
dominance frontiers : 0.06 ( 0%) usr 0 kB ( 0%) ggc
dominance computation : 0.65 ( 1%) usr 0 kB ( 0%) ggc
out of ssa : 0.09 ( 0%) usr 1 kB ( 0%) ggc
expand vars : 0.02 ( 0%) usr 765 kB ( 0%) ggc
expand : 0.13 ( 0%) usr 13796 kB ( 1%) ggc
post expand cleanups : 0.03 ( 0%) usr 2868 kB ( 0%) ggc
forward prop : 0.08 ( 0%) usr 156 kB ( 0%) ggc
CSE : 0.08 ( 0%) usr 304 kB ( 0%) ggc
dead code elimination : 0.03 ( 0%) usr 0 kB ( 0%) ggc
dead store elim1 : 0.09 ( 0%) usr 763 kB ( 0%) ggc
dead store elim2 : 0.08 ( 0%) usr 613 kB ( 0%) ggc
loop init : 0.15 ( 0%) usr 65 kB ( 0%) ggc
branch prediction : 0.12 ( 0%) usr 19 kB ( 0%) ggc
combiner : 0.10 ( 0%) usr 216 kB ( 0%) ggc
if-conversion : 0.01 ( 0%) usr 0 kB ( 0%) ggc
integrated RA : 0.43 ( 0%) usr 9659 kB ( 1%) ggc
LRA non-specific : 0.26 ( 0%) usr 305 kB ( 0%) ggc
LRA virtuals elimination: 0.03 ( 0%) usr 304 kB ( 0%) ggc
LRA create live ranges : 0.03 ( 0%) usr 152 kB ( 0%) ggc
LRA hard reg assignment : 0.02 ( 0%) usr 0 kB ( 0%) ggc
reload CSE regs : 0.19 ( 0%) usr 916 kB ( 0%) ggc
thread pro- & epilogue : 0.04 ( 0%) usr 14 kB ( 0%) ggc
hard reg cprop : 0.07 ( 0%) usr 0 kB ( 0%) ggc
shorten branches : 0.08 ( 0%) usr 0 kB ( 0%) ggc
final : 0.16 ( 0%) usr 279 kB ( 0%) ggc
initialize rtl : 0.01 ( 0%) usr 12 kB ( 0%) ggc
rest of compilation : 0.31 ( 0%) usr 879 kB ( 0%) ggc
remove unused locals : 2.24 ( 2%) usr 0 kB ( 0%) ggc
address taken : 1.00 ( 1%) usr 37564 kB ( 2%) ggc
rebuild frequencies : 0.02 ( 0%) usr 0 kB ( 0%) ggc
TOTAL : 128.90 1624743 kB

最佳答案

我在表达式中看到一些冗余,例如:

在h(0,2)和h(0,3)中看到的exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky))
-O2强制编译以检测和重用模式。似乎6k行表达式的复杂性太高了。您可以使用tmp变量来帮助gcc。这等效于构建依赖图,然后生成代码。

关于c++ - 时间:2019-05-10标签:c++ : Compilation of huge amount expressions failed with -O2 optimization?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44087703/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com