gpt4 book ai didi

java - 如何使用 XMLStreamReader 获取大文件的进度

转载 作者:行者123 更新时间:2023-12-02 03:33:25 24 4
gpt4 key购买 nike

我正在使用下面的代码使用 XMLStreamReader 读取 hadoop RecordReader 中的大型 xml 文件(以 GB 为单位)

public class RecordReader {
int progressCouunt = 0;
public RecordReader() {
XMLInputFactory factory = XMLInputFactory.newInstance();
FSDataInputStream fdDataInputStream = fs.open(file); //hdfs file
try {
reader = factory.createXMLStreamReader(fdDataInputStream);
} catch (XMLStreamException exception) {
throw new RuntimeException("XMLStreamException exception : ", exception);
}
}
@Override
public float getProgress() throws IOException, InterruptedException {
return progressCouunt;
}
}

我的问题是如何使用 XMLStreamReader 获取文件的读取进度,因为它不提供任何开始或结束位置来计算进度百分比。我已引用How do I keep track of parsing progress of large files in StAX? ,但不能使用filterReader。请在这里帮助我。

最佳答案

您可以通过扩展 FilterInputStream 来包装 InputStream

public interface InputStreamListener {
void onBytesRead(long totalBytes);
}

public class PublishingInputStream extends FilterInputStream {
private final InputStreamListener;
private long totalBytes = 0;

public PublishingInputStream(InputStream in, InputStreamListener listener) {
super(in);
this.listener = listener;
}

@Override
public int read(byte[] b) {
int count = super.read(b);
this.totalBytes += count;
this.listener.onBytesRead(totalBytes);
}

// TODO: override the other read() methods
}

使用

XMLInputFactory factory = XMLInputFactory.newInstance();
InputStream in = fs.open(file);
final long fileSize = someHadoopService.getFileLength(file);
InputStremListener listener = new InputStreamListener() {
public void onBytesRead(long totalBytes) {
System.out.println(String.format("Read %s of %s bytes", totalBytes, fileSize));
}
};
InputStream publishingIn = new PublishingInputStream(in, listener);
try {
reader = factory.createXMLStreamReader(publishingIn);
// etc

关于java - 如何使用 XMLStreamReader 获取大文件的进度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37750796/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com