gpt4 book ai didi

java - 程序超过理论内存传输率

转载 作者:IT王子 更新时间:2023-10-28 23:29:56 25 4
gpt4 key购买 nike

我有一台配备 Intel Core 2 Duo 2.4GHz CPU 和 2x4Gb DDR3 模块 1066MHz 的笔记本电脑。

我希望这个内存可以以 1067 MiB/sec 的速度运行,只要有两个 channel ,最大速度就是 2134 MiB/sec(如果操作系统内存调度程序允许) .

我制作了一个小型 Java 应用程序来测试:

private static final int size = 256 * 1024 * 1024; // 256 Mb
private static final byte[] storage = new byte[size];

private static final int s = 1024; // 1Kb
private static final int duration = 10; // 10sec

public static void main(String[] args) {
long start = System.currentTimeMillis();
Random rnd = new Random();
byte[] buf1 = new byte[s];
rnd.nextBytes(buf1);
long count = 0;
while (System.currentTimeMillis() - start < duration * 1000) {
long begin = (long) (rnd.nextDouble() * (size - s));
System.arraycopy(buf1, 0, storage, (int) begin, s);
++count;
}
double totalSeconds = (System.currentTimeMillis() - start) / 1000.0;
double speed = count * s / totalSeconds / 1024 / 1024;
System.out.println(count * s + " bytes transferred in " + totalSeconds + " secs (" + speed + " MiB/sec)");

byte[] buf2 = new byte[s];
count = 0;
start = System.currentTimeMillis();
while (System.currentTimeMillis() - start < duration * 1000) {
long begin = (long) (rnd.nextDouble() * (size - s));
System.arraycopy(storage, (int) begin, buf2, 0, s);
Arrays.fill(buf2, (byte) 0);
++count;
}
totalSeconds = (System.currentTimeMillis() - start) / 1000.0;
speed = count * s / totalSeconds / 1024 / 1024;
System.out.println(count * s + " bytes transferred in " + totalSeconds + " secs (" + speed + " MiB/sec)");
}

我预计结果会低于 2134 MiB/秒,但我得到了以下结果:

17530212352 bytes transferred in 10.0 secs (1671.811328125 MiB/sec)
31237926912 bytes transferred in 10.0 secs (2979.080859375 MiB/sec)

速度接近 3 GiB/秒怎么可能?

DDR3 module photo

最佳答案

这里有很多事情在起作用。

首先:formula for memory transfer rate of DDR3

memory clock rate
× 4 (for bus clock multiplier)
× 2 (for data rate)
× 64 (number of bits transferred)
/ 8 (number of bits/byte)
= memory clock rate × 64 (in MB/s)

对于 DDR3-1066(主频为 133⅓ MHz),我们获得理论内存带宽8533⅓ MB/s8138.02083333... MiB/s 表示单 channel ,17066⅔ MB/s,或 16276.0416666...MiB/s 表示双 channel 。

第二:传输一大块数据比传输许多小块数据要快。

第三:测试忽略了可能发生的缓存效果。

第四:如果要进行时间测量,应该使用System.nanoTime()。这种方法更精确。

这是测试程序的重写版本1

import java.util.Random;

public class Main {

public static void main(String... args) {
final int SIZE = 1024 * 1024 * 1024;
final int RUNS = 8;
final int THREADS = 8;
final int TSIZE = SIZE / THREADS;
assert (TSIZE * THREADS == THREADS) : "TSIZE must divide SIZE!";
byte[] src = new byte[SIZE];
byte[] dest = new byte[SIZE];
Random r = new Random();
long timeNano = 0;

Thread[] threads = new Thread[THREADS];
for (int i = 0; i < RUNS; ++i) {
System.out.print("Initializing src... ");
for (int idx = 0; idx < SIZE; ++idx) {
src[idx] = ((byte) r.nextInt(256));
}
System.out.println("done!");
System.out.print("Starting test... ");
for (int idx = 0; idx < THREADS; ++idx) {
final int from = TSIZE * idx;
threads[idx]
= new Thread(() -> {
System.arraycopy(src, from, dest, 0, TSIZE);
});
}
long start = System.nanoTime();
for (int idx = 0; idx < THREADS; ++idx) {
threads[idx].start();
}
for (int idx = 0; idx < THREADS; ++idx) {
try {
threads[idx].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
timeNano += System.nanoTime() - start;
System.out.println("done!");
}
double timeSecs = timeNano / 1_000_000_000d;

System.out.println("Transfered " + (long) SIZE * RUNS
+ " bytes in " + timeSecs + " seconds.");

System.out.println("-> "
+ ((long) SIZE * RUNS / timeSecs / 1024 / 1024 / 1024)
+ " GiB/s");
}
}

这样,尽可能多地减少“其他计算”,并且(几乎)只测量通过 System.arraycopy(...) 的内存复制率。该算法在缓存方面可能仍然存在问题。

对于我的系统(双 channel DDR3-1600),我得到大约 6 GiB/s,而理论限制大约是 25 GiB/s(包括 DualChannel )。

As was pointed out by Nick Mertin ,JVM 引入了一些开销。因此,预计您无法达到理论极限。


1 旁注:要运行程序,必须给 JVM 更多的堆空间。就我而言,4096 MB 就足够了。

关于java - 程序超过理论内存传输率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31213023/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com