gpt4 book ai didi

java - 比较直接和非直接 ByteBuffer 的 get/put 操作

转载 作者:IT王子 更新时间:2023-10-28 23:33:03 24 4
gpt4 key购买 nike

从非直接字节缓冲区获取/放置是否比从直接字节缓冲区获取/放置更快?

如果我必须从直接字节缓冲区读取/写入,最好先读取/写入线程本地字节数组,然后使用字节数组完全更新(用于写入)直接字节缓冲区?

最佳答案

Is get/put from a non-direct bytebuffer faster than get/put from direct bytebuffer ?

如果您将堆缓冲区与不使用 native 字节顺序的直接缓冲区进行比较(大多数系统是小端,直接字节缓冲区的默认值是大端),性能非常相似。

如果您使用 native 有序字节缓冲区,则多字节值的性能会显着提高。对于 byte 而言,无论你做什么,它都没什么区别。

在 HotSpot/OpenJDK 中,ByteBuffer 使用 Unsafe 类,许多 native 方法被视为 intrinsics .这是依赖于 JVM 的,并且 AFAIK Android VM 在最近的版本中将其视为固有的。

如果您转储生成的程序集,您可以看到 Unsafe 中的内在函数被转换为一条机器代码指令。即它们没有 JNI 调用的开销。

事实上,如果您进行微调,您可能会发现 ByteBuffer getXxxx 或 setXxxx 的大部分时间都花在边界检查上,而不是实际的内存访问。出于这个原因,我仍然在必要时直接使用 Unsafe 以获得最佳性能(注意:Oracle 不鼓励这样做)

If I have to read / write from direct bytebuffer , is it better to first read /write in to a thread local byte array and then update ( for writes ) the direct bytebuffer fully with the byte array ?

我不想看到这比什么更好。 ;) 这听起来很复杂。

通常最简单的解决方案更好更快。


您可以使用此代码自行测试。

public static void main(String... args) {
ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
for (int i = 0; i < 10; i++)
runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
bb1.clear();
bb2.clear();
long start = System.nanoTime();
int count = 0;
while (bb2.remaining() > 0)
bb2.putInt(bb1.getInt());
long time = System.nanoTime() - start;
int operations = bb1.capacity() / 4 * 2;
System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

打印

Each putInt/getInt took an average of 83.9 ns
Each putInt/getInt took an average of 1.4 ns
Each putInt/getInt took an average of 34.7 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns

我很确定 JNI 调用需要的时间超过 1.2 ns。


为了证明它不是“JNI”调用,而是它周围的胡言乱语导致了延迟。您可以直接使用 Unsafe 编写相同的循环。

public static void main(String... args) {
ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
for (int i = 0; i < 10; i++)
runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
Unsafe unsafe = getTheUnsafe();
long start = System.nanoTime();
long addr1 = ((DirectBuffer) bb1).address();
long addr2 = ((DirectBuffer) bb2).address();
for (int i = 0, len = Math.min(bb1.capacity(), bb2.capacity()); i < len; i += 4)
unsafe.putInt(addr1 + i, unsafe.getInt(addr2 + i));
long time = System.nanoTime() - start;
int operations = bb1.capacity() / 4 * 2;
System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

public static Unsafe getTheUnsafe() {
try {
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
return (Unsafe) theUnsafe.get(null);
} catch (Exception e) {
throw new AssertionError(e);
}
}

打印

Each putInt/getInt took an average of 40.4 ns
Each putInt/getInt took an average of 44.4 ns
Each putInt/getInt took an average of 0.4 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns

因此,您可以看到 native 调用比您预期的 JNI 调用要快得多。这种延迟的主要原因可能是 L2 缓存速度。 ;)

全部在 i3 3.3 GHz 上运行

关于java - 比较直接和非直接 ByteBuffer 的 get/put 操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11174231/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com