gpt4 book ai didi

c++ - VM 上奇怪的程序延迟行为

转载 作者:太空狗 更新时间:2023-10-29 12:03:53 24 4
gpt4 key购买 nike

我编写了一个程序来读取 256KB 数组以获得 1 毫秒的延迟。该程序非常简单且已附加。但是,当我在 Xen 上的 VM 上运行它时,我发现延迟不稳定。它具有以下模式:时间单位是毫秒。

    #totalCycle CyclePerLine  totalms
22583885 5513 6.452539
3474342 848 0.992669
3208486 783 0.916710
25848572 6310 7.385306
3225768 787 0.921648
3210487 783 0.917282
25974700 6341 7.421343
3244891 792 0.927112
3276027 799 0.936008
25641513 6260 7.326147
3531084 862 1.008881
3233687 789 0.923911
22397733 5468 6.399352
3523403 860 1.006687
3586178 875 1.024622
26094384 6370 7.455538
3540329 864 1.011523
3812086 930 1.089167
25907966 6325 7.402276

我在想某个过程正在做某事,它就像一个事件驱动的过程。有没有人遇到过这个?或者任何人都可以指出可以实现这一目标的潜在流程/服务?

下面是我的程序。我运行了 1000 次。每次都得到上面一行的结果。

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <string>
#include <ctime>

using namespace std;

#if defined(__i386__)
static __inline__ unsigned long long rdtsc(void)
{
unsigned long long int x;
__asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
#elif defined(__x86_64__)
static __inline__ unsigned long long rdtsc(void)
{
unsigned hi, lo;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
#endif

#define CACHE_LINE_SIZE 64

#define WSS 24567 /* 24 Mb */
#define NUM_VARS WSS * 1024 / sizeof(long)

#define KHZ 3500000

// ./a.out memsize(in KB)
int main(int argc, char** argv)
{
unsigned long wcet = atol(argv[1]);
unsigned long mem_size_KB = 256; // mem size in KB
unsigned long mem_size_B = mem_size_KB * 1024; // mem size in Byte
unsigned long count = mem_size_B / sizeof(long);
unsigned long row = mem_size_B / CACHE_LINE_SIZE;
int col = CACHE_LINE_SIZE / sizeof(long);

unsigned long long start, finish, dur1;
unsigned long temp;

long *buffer;
buffer = new long[count];

// init array
for (unsigned long i = 0; i < count; ++i)
buffer[i] = i;

for (unsigned long i = row-1; i >0; --i) {
temp = rand()%i;
swap(buffer[i*col], buffer[temp*col]);
}

// warm the cache again
temp = buffer[0];
for (unsigned long i = 0; i < row-1; ++i) {
temp = buffer[temp];
}

// First read, should be cache hit
temp = buffer[0];
start = rdtsc();
int sum = 0;
for(int wcet_i = 0; wcet_i < wcet; wcet_i++)
{
for(int j=0; j<21; j++)
{
for (unsigned long i = 0; i < row-1; ++i) {
if (i%2 == 0) sum += buffer[temp];
else sum -= buffer[temp];
temp = buffer[temp];
}
}
}
finish = rdtsc();
dur1 = finish-start;

// Res
printf("%lld %lld %.6f\n", dur1, dur1/row, dur1*1.0/KHZ);
delete[] buffer;
return 0;
}

最佳答案

RDTSC指令在虚拟机中的使用比较复杂。管理程序 (Xen) 很可能通过捕获它来模拟 RDTSC 指令。你最快的运行显示大约 800 个周期/缓存行,这非常非常慢......唯一的解释是 RDTSC 导致由管理程序处理的陷阱,开销是性能瓶颈。我不确定您是否会定期看到更长的时间,但鉴于 RDTSC 被困,所有时间赌注都将取消。

你可以在这里阅读更多相关信息

http://xenbits.xen.org/docs/4.2-testing/misc/tscmode.txt

Instructions in the rdtsc family are non-privileged, but privileged software may set a cpuid bit to cause all rdtsc family instructions to trap. This trap can be detected by Xen, which can then transparently "emulate" the results of the rdtsc instruction and return control to the code following the rdtsc instruction

顺便说一句,那篇文章是错误的,因为管理程序没有设置 cpuid 位 来导致 RDTSC 陷入陷阱,它是控制寄存器 4 (CR4.TSD) 中的位 #2:

http://en.wikipedia.org/wiki/Control_register#CR4

关于c++ - VM 上奇怪的程序延迟行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22579864/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com