gpt4 book ai didi

java - 了解通过InputStreamReader将stdin中的行读入char []时的性能不佳

转载 作者:行者123 更新时间:2023-12-02 13:29:03 26 4
gpt4 key购买 nike

在竞争编程的过程中,对数过滤器(使用多种编程语言/技术)中,我发现从stdin读取Java的性能相当差。

首先,与其他技术相比,我将问题归结为从stdin读取行的性能(尚无文本处理或正则表达式)。

Fastest way for line-by-line reading STDIN?答案的启发,我编写了自己的线路阅读器,但速度慢了1.3倍。



被测代码实施



LineReader.java

package org.acme.logfilter;

import java.io.IOException;
import java.io.InputStreamReader;

public class LineReader {

private static final int DEFAULT_READ_BUFFER_SIZE = 32768;
private static final int INITIAL_LINE_BUFFER_SIZE = 128;

private InputStreamReader isr;
private int lineBufferSize;

// To buffer the read from the input stream
private char[] readBuffer;

// The extracted line
private char[] lineBuffer;

// Bytes read from the input stream
private int readBufferCapacity = 0;

// Position in the read buffer
private int readIdx = 0;

// The line length remembered with the last readLine()
private int lineLength = 0;

public LineReader(InputStreamReader isr) {
this(isr, DEFAULT_READ_BUFFER_SIZE);
}

public LineReader(InputStreamReader isr, int readBufferSize) {
this.isr = isr;
this.lineBufferSize = INITIAL_LINE_BUFFER_SIZE;

this.readBuffer = new char[readBufferSize];
this.lineBuffer = new char[lineBufferSize];
}

public boolean readLine() throws IOException {
// Copy reference & value for slightly improved performance
char[] readBuffer = this.readBuffer;
// A local reference improves performance slightly
int readIdx = this.readIdx;
// Index of the (target) line array (equals to the line length)
int lineIdx = 0;

while (true) {
if (readIdx == readBufferCapacity) {
// Read buffer not filled yet or exceeded
// (The line buffer might not be complete yet)

// Reset the read buffer index (it has exceeded)
readIdx = 0;

// (Re)fill the buffer ...
readBufferCapacity = isr.read(readBuffer, 0, readBuffer.length);

if (readBufferCapacity <= 0) {
// Though the stream ended, we previously read a line
// without CR
return lineIdx > 0 ? true : false;
}
}

if (lineIdx == lineBufferSize) {
// Line buffer is full, create new buffer and "backup" line

// Remember current buffer before creating new one
char[] oldLineBuffer = lineBuffer;
// Extend by initial size
lineBufferSize += INITIAL_LINE_BUFFER_SIZE;
lineBuffer = new char[lineBufferSize];

// Copy incomplete line to the bigger buffer ...
System.arraycopy(oldLineBuffer, 0, lineBuffer, 0, lineIdx);
}

char chr = readBuffer[readIdx];
readIdx++;

if (chr == '\n') {
this.lineLength = lineIdx;
// "Export" localized variables
this.readIdx = readIdx;
return true;
}

lineBuffer[lineIdx] = chr;
lineIdx++;
}
}

public char[] getLine() {
return lineBuffer;
}

public int getLineLength() {
return lineLength;
}
}


注意代码

目前可以接受的是它不能正确处理CRLF换行符,这不是问题(因为它在功能较少的情况下表现更差)。只处理一个 char[]缓冲区是有意的。这个想法是要节省任何 StringBuffer或重复的 char[]分配开销和复制。由于使用程序仅用于读取而不是操作字符串,因此我认为将 char[]包裹为 CharSequence以便将char序列输入到其他方法是个好主意。

如果我只能获得很小的性能优势,则永远不会使用此类代码实现日志过滤器。这仅用于改善 BufferedReader的性能较差的过程。

测试类的实现

FilterLogStdBufferedReader.java

InputStreamReader isr = new InputStreamReader(System.in);
BufferedReader br = new BufferedReader(isr, 32768 * 1024);

String line;
long lines = 0;

while ((line = br.readLine()) != null) {
lines++;
}


FilterLogCustomLineparserExt.java

InputStreamReader isr = new InputStreamReader(System.in);
LineReader reader = new LineReader(isr, 32768 * 1024);

long lines = 0;

while (reader.readLine()) {
lines++;
}


分析结果

time()结果

$ time ( cat /ramdisk/1gb.txt | java -cp bin/ org.acme.logfilter.FilterLogStdBufferedReader )

real 8.10
user 6.08
sys 3.73


$ time ( cat /ramdisk/1gb.txt | java -cp bin/ org.acme.logfilter.FilterLogCustomLineparserExt )

real 9.49
user 7.92
sys 3.22


平均了10次迭代。从ramdisk读取每行79个字符的1GB文件。

-Xprof

-Xprof概述了JVM如何解释和运行代码(解释代码或执行JIT编译或本机代码花了多少时间)。

结果

FilterLogStdBufferedReader.java

Flat profile of 9.80 secs (768 total ticks): main

Interpreted + native Method
0.7% 5 + 0 org.acme.logfilter.FilterLogStdBufferedReader.main
0.4% 0 + 3 java.io.FileInputStream.available
0.4% 3 + 0 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
0.3% 2 + 0 java.io.BufferedReader.readLine
...
2.2% 13 + 4 Total interpreted

Compiled + native Method
45.3% 347 + 1 org.acme.logfilter.FilterLogStdBufferedReader.main
0.8% 6 + 0 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
0.5% 0 + 4 java.io.BufferedReader.readLine
0.4% 0 + 3 java.io.BufferedReader.readLine
...
47.3% 354 + 9 Total compiled

Stub + native Method
33.7% 0 + 259 java.io.FileInputStream.available
16.7% 0 + 128 java.io.FileInputStream.readBytes
0.1% 0 + 1 java.lang.System.arraycopy
50.5% 0 + 388 Total stub


Global summary of 9.80 seconds:
100.0% 777 Received ticks
1.2% 9 Received GC ticks
4.4% 34 Compilation


FilterLogCustomLineparserExt.java

Flat profile of 13.88 secs (1017 total ticks): main

Interpreted + native Method
0.3% 3 + 0 org.acme.logfilter.FilterLogCustomLineparserExt.main
0.2% 0 + 2 java.io.FileInputStream.available
0.2% 2 + 0 org.acme.logfilter.LineReader.readLine
0.2% 2 + 0 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
...
1.2% 10 + 2 Total interpreted

Compiled + native Method
57.7% 587 + 0 org.acme.logfilter.FilterLogCustomLineparserExt.main
1.7% 17 + 0 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
0.2% 1 + 1 org.acme.logfilter.LineReader.readLine
...
59.8% 606 + 2 Total compiled

Stub + native Method
24.0% 0 + 244 java.io.FileInputStream.available
14.8% 0 + 151 java.io.FileInputStream.readBytes
0.2% 0 + 2 java.lang.System.arraycopy
39.0% 0 + 397 Total stub


Global summary of 13.88 seconds:
100.0% 1018 Received ticks
2.7% 27 Compilation


(为简便起见,我删除了百分比<= 0.1%的行块,并用“ ...”代替。)

观察/结论

观察:


JVM花更多的时间为 FilterLogStdBufferedReader编译代码,
与执行 FilterLogCustomLineparserExt中的本机代码相比,JVM在执行编译后的代码上花费的时间更多,
sun.nio.cs.UTF_8$Decoder.decodeArrayLoop调用 FilterLogCustomLineparserExt的频率更高,或者被发现的激活时间更长,
在两种实现中,花在解释代码上的时间都可以忽略不计,


结论:


LineReader不能进行优化以使JVM及时编译更多代码(解释更少),并且
LineReader应该进行优化以执行“更少不必要的”工作,以使(编译后的)代码不会“浪费”太多时间


hprof = cpu =乘以结果

cpu=times计算对方法的调用,并计算调用对CPU时间的贡献。

结果

缓冲读取器

$ cat /ramdisk/1gb.txt | java -agentlib:hprof=cpu=times,file=stdbufferedreader.hprof.txt -cp bin/ org.acme.logfilter.FilterLogStdBufferedReader

CPU TIME (ms) BEGIN (total = 321694) Sat Aug 26 09:42:52 2017
rank self accum count trace method
1 28.49% 28.49% 13107201 301905 java.io.BufferedReader.readLine
2 17.69% 46.17% 13107201 301906 java.io.BufferedReader.readLine
3 17.59% 63.77% 13107154 301904 java.lang.String.<init>
4 10.07% 73.84% 1 302038 org.acme.logfilter.FilterLogStdBufferedReader.main
5 7.86% 81.70% 13107154 301903 java.util.Arrays.copyOfRange
6 7.31% 89.01% 13107201 301826 java.io.BufferedReader.ensureOpen
7 1.86% 90.87% 128061 301866 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
8 1.00% 91.87% 128001 301894 sun.nio.cs.StreamDecoder.readBytes
9 0.97% 92.84% 128001 301880 java.nio.HeapByteBuffer.compact
10 0.67% 93.51% 61 301898 sun.nio.cs.StreamDecoder.implRead
11 0.66% 94.17% 128001 301888 java.io.FileInputStream.read
12 0.48% 94.65% 128061 301849 sun.nio.cs.UTF_8.updatePositions
13 0.41% 95.07% 128001 301889 java.io.BufferedInputStream.read1
...


LineReader(自定义实现)

$ cat /ramdisk/1gb.txt | java -agentlib:hprof=cpu=times,file=custom.hprof.txt -cp bin/ org.acme.logfilter.FilterLogCustomLineparserExt

CPU TIME (ms) BEGIN (total = 103141) Sat Aug 26 09:39:02 2017
rank self accum count trace method
1 34.11% 34.11% 13107201 301921 org.acme.logfilter.LineReader.readLine
2 31.22% 65.32% 1 302011 org.acme.logfilter.FilterLogCustomLineparserExt.main
3 5.75% 71.07% 128040 301886 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
4 3.10% 74.17% 128001 301914 sun.nio.cs.StreamDecoder.readBytes
5 3.01% 77.18% 128001 301900 java.nio.HeapByteBuffer.compact
6 2.65% 79.83% 128001 301908 java.io.FileInputStream.read
7 2.10% 81.93% 40 301918 sun.nio.cs.StreamDecoder.implRead
8 1.46% 83.38% 128040 301869 sun.nio.cs.UTF_8.updatePositions
9 1.24% 84.63% 128040 301890 java.nio.charset.CharsetDecoder.decode
10 1.20% 85.83% 128001 301909 java.io.BufferedInputStream.read1
11 1.17% 86.99% 128040 301887 sun.nio.cs.UTF_8$Decoder.decodeLoop
12 0.91% 87.90% 127971 301916 java.io.BufferedInputStream.available
13 0.85% 88.76% 128001 301910 java.io.BufferedInputStream.read
14 0.61% 89.36% 127971 301917 sun.nio.cs.StreamDecoder.inReady
15 0.53% 89.90% 128040 301885 sun.nio.cs.UTF_8$Decoder.xflow
16 0.52% 90.42% 128040 301870 sun.nio.cs.UTF_8.access$200
17 0.48% 90.90% 256080 301867 java.nio.Buffer.position
18 0.46% 91.36% 256080 301860 java.nio.ByteBuffer.arrayOffset
19 0.44% 91.80% 256080 301861 java.nio.Buffer.position
20 0.44% 92.24% 256002 301894 java.nio.HeapByteBuffer.ix
21 0.43% 92.68% 256080 301862 java.nio.Buffer.limit
22 0.43% 93.11% 256002 301895 java.nio.Buffer.remaining
23 0.42% 93.53% 256080 301864 java.nio.CharBuffer.arrayOffset
...


观察/结论

观察:


自定义实现在 readLine()中花费更多时间。
自定义实现中的CPU时间缩短了三倍( total = 103141)。


结论:


自定义实现不会经常意外地调用本机代码。
定时分析执行时间时,CPU时间值与 user时间匹配。我认为这是由于 BufferedReader实现运行的时间更长,这是因为代码更多,因此仪器也更多。这与不进行概要分析的反向运行时间并不矛盾。




到目前为止的优化尝试


lineIdxreadIdx设置为本地有助于将性能提高到当前(仍然很差)状态
用直接由 CharSequence返回的 readLine()代替多个吸气剂(性能显着降低)


问题

我对探查器结果的解释正确吗?

LineReader一次又一次创建 BufferedReaderStringBuffers实例并不断复制数据的 char[]相比, 的性能有何劣势?

如何改善实施?

最佳答案

您的LineReader实现存在许多问题,使其无法达到最佳效果。


首先,readLine是一种大型方法,具有复杂的控制流,这使JVM难以应用优化。
lineBuffer逐字符填充,而使用批量复制速度更快。
访问readBufferlineBuffer数组时,对索引变量没有明显的限制,因此JVM将对每个数组操作发出数组边界检查。


我的建议是:


使用简短的单独循环查找\n字符的索引。它将受益于许多JIT优化,例如循环展开,数组边界检查消除,更好的寄存器分配等。
找到\n后,立即使用System.arraycopy填充lineBuffer


这是一个示例,功能不完全,但是可以使您了解它的外观。

public boolean readLine() throws IOException {
do {
int cr = findCR(readBuffer, readIdx, readBufferCapacity);
if (cr >= 0) {
lineLength = cr - readIdx - 1;
System.arraycopy(readBuffer, readIdx, lineBuffer, 0, lineLength);
readIdx = cr;
return true;
}
} while (refill());
return false;
}

private int findCR(char[] readBuffer, int pos, int limit) {
// Ensuring that limit <= readBuffer.length helps JIT to eliminate array bounds check
limit = Math.min(limit, readBuffer.length);
while (pos < limit) {
if (readBuffer[pos++] == '\n') {
return pos;
}
}
return -1;
}


旁注


缓冲区大小太大,会对CPU缓存产生负面影响。 32K和256K之间的某个值应该对性能更好。
不要使用hprof,它会修改代码运行,并经常导致失真的结果。我相信 async-profiler会更精确;它还显示了花费在本机代码和内核代码上的时间。

关于java - 了解通过InputStreamReader将stdin中的行读入char []时的性能不佳,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45902998/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com