gpt4 book ai didi

c++ - VM/解释器的性能改进策略?

转载 作者:太空狗 更新时间:2023-10-29 20:45:02 25 4
gpt4 key购买 nike

我用 C 编写了一个简单的 VM,使用简单的指令切换,没有任何指令解码,但性能很糟糕。

对于简单的算术运算,对于相同的运算,VM 比 native C 代码慢大约 4000 倍。我测试了一组长度为 1000 万的数组,第一个包含程序指令,随机 + - */操作,2 个数组包含随机整数,第三个数组是操作目标存储。

我原以为算术性能会下降 3-4 倍,所以 4000x 真的让我大吃一惊。即使是最慢的解释型语言似乎也能提供更高的性能。那么,我的方法哪里出了问题,如何在不借助 JIT 编译机器代码的情况下提高性能?

实现是...基本上是我能想到的最简单的实现:

begin:
{
switch (*(op+(c++)))
{
case 0:
add(in1+c, in2+c, out+c); goto begin;

case 1:
sub(in1+c, in2+c, out+c); goto begin;

case 2:
mul(in1+c, in2+c, out+c); goto begin;

case 3:
div(in1+c, in2+c, out+c); goto begin;

case 4:
cout << "end of program" << endl;
goto end;

default:
cout << "ERROR!!!" << endl;

}
}

end:

更新:当我注意到我用来分析的 QElapsedTimer 实际上坏了时,我正在研究程序的长度。现在我正在使用 clock() 函数,根据它计算的 goto 实际上与 native 代码运行相同,可能稍微低一点。这个结果合法吗???这是完整的源代码(我知道它很丑,毕竟它只是为了测试):

#include <QtGlobal>
#include <iostream>
#include <stdio.h>
#include <ctime>

using namespace std;

#define LENGTH 70000000

void add(int & a, int & b, int & r) {r = a * b;}
void sub(int & a, int & b, int & r) {r = a - b;}
void mul(int & a, int & b, int & r) {r = a * b;}
void div(int & a, int & b, int & r) {r = a / b;}

int main()
{
char * op = new char[LENGTH];
int * in1 = new int[LENGTH];
int * in2 = new int[LENGTH];
int * out = new int[LENGTH];

for (int i = 0; i < LENGTH; ++i)
{
*(op+i) = i % 4;
*(in1+i) = qrand();
*(in2+i) = qrand()+1;
}

*(op+LENGTH-1) = 4; // end of program


long long sClock, fClock;


unsigned int c = 0;
sClock = clock();

cout << "Program begins" << endl;

static void* table[] = {
&&do_add,
&&do_sub,
&&do_mul,
&&do_div,
&&do_end,
&&do_err,
&&do_fin};

#define jump() goto *table[op[c++]]

jump();
do_add:
add(in1[c], in2[c], out[c]); jump();
do_sub:
sub(in1[c], in2[c], out[c]); jump();
do_mul:
mul(in1[c], in2[c], out[c]); jump();
do_div:
div(in1[c], in2[c], out[c]); jump();
do_end:
cout << "end of program" << endl; goto *table[6];
do_err:
cout << "ERROR!!!" << endl; goto *table[6];
do_fin:

fClock = clock();
cout << fClock - sClock << endl;

delete [] op;
delete [] in1;
delete [] in2;
delete [] out;

in1 = new int[LENGTH];
in2 = new int[LENGTH];
out = new int[LENGTH];

for (int i = 0; i < LENGTH; ++i)
{
*(in1+i) = qrand();
*(in2+i) = qrand()+1;
}

cout << "Native begins" << endl;

sClock = clock();

for (int i = 0; i < LENGTH; i += 4)
{

*(out+i) = *(in1+i) + *(in2+i);
*(out+i+1) = *(in1+i+1) - *(in2+i+1);
*(out+i+2) = *(in1+i+2) * *(in2+i+2);
*(out+i+3) = *(in1+i+3) / *(in2+i+3);
}

fClock = clock();
cout << fClock - sClock << endl;

delete [] in1;
delete [] in2;
delete [] out;

return 0;
}

最佳答案

Darek Mihocka 有一篇关于在可移植 C 中创建快速解释器的很好而深入的文章:http://www.emulators.com/docs/nx25_nostradamus.htm

关于c++ - VM/解释器的性能改进策略?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11720357/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com