gpt4 book ai didi

multithreading - Hadoop Zlib 与 JDK Gzip 性能对比

转载 作者:可可西里 更新时间:2023-11-01 16:41:35 45 4
gpt4 key购买 nike

我正在对单线程压缩编解码器进行一些基准测试,我看到 Zlib 的性能似乎明显高于您对单线程的预期。我使用 org.apache.hadoop.io.compress.zlib.ZlibCompressor 来实现 Zlib 压缩器,使用 java.util.zip.Deflate 来实现 Gzip 来与.

ZLib 压缩器(包装器)是否以某种方式在 Hadoop 中以多线程方式提供,也许是通过 JNI 接口(interface)?

Zlib:

import org.apache.hadoop.io.compress.zlib.*;
protected final zlibCompressor = new ZlibCompressor(ZlibCompressor.CompressionLevel.DEFAULT_COMPRESSION, ZlibCompressor.CompressionStrategy.DEFAULT_STRATEGY, ZlibCompressor.CompressionHeader.DEFAULT_HEADER, DEFAULT_BUFFER_SIZE);
protected final zlibDecompressor = new ZlibDecompressor(ZlibDecompressor.CompressionHeader.DEFAULT_HEADER, DEFAULT_BUFFER_SIZE);

//compress
zlibCompressor.setInput(uncompressed, 0, uncompressed.length);
zlibCompressor.finish();
int n = zlibCompressor.compress(compressBuffer, 0, compressBuffer.length);

//decompress
zlibCompressor.reset();
zlibDecompressor.setInput(compressed, 0, compressed.length);
int n = zlibDecompressor.decompress(uncompressBuffer, 0, uncompressBuffer.length);

Gzip:

import java.util.zip.*;
protected final deflater = new Deflater(COMPRESSION_LEVEL, NO_WRAP);
protected final inflater = new Inflater(NO_WRAP);

//compress
int n = compressBlockUsingStream(uncompressed, compressBuffer);

//decompress
inflater.reset();
int n = uncompressBlockUsingStream(new InflaterInputStream(new ByteArrayInputStream(compressed), _inflater), uncompressBuffer);

Gzip 的辅助函数:

protected int compressBlockUsingStream(byte[] uncompressed, byte[] compressBuffer) throws IOException
{
ByteArrayOutputStream out = new ByteArrayOutputStream(compressBuffer);
compressToStream(uncompressed, out);
return out.length();
}

protected int uncompressBlockUsingStream(InputStream in, byte[] uncompressBuffer) throws IOException
{
ByteArrayOutputStream out = new ByteArrayOutputStream(uncompressBuffer);
byte[] buffer = new byte[4096];
int count;
while ((count = in.read(buffer)) >= 0) {
out.write(buffer, 0, count);
}
in.close();
out.close();
return out.length();
}

吞吐量:

Zlib/block -- 143.902 MBps

Gzip/JDK/stream -- 22.573 MBps

有人知道为什么 zlib 如此快( native 使用所有内核)吗?该代码预计将单线程运行。任何人都能够复制类似的结果?

最佳答案

java.util.zip 使用 zlib。

你确定你在两者中使用相同的压缩级别吗? COMPRESSION_LEVEL 是否等于 ZlibCompressor.CompressionLevel.DEFAULT_COMPRESSION

关于multithreading - Hadoop Zlib 与 JDK Gzip 性能对比,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40761915/

45 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com