gpt4 book ai didi

HDFS 0.22.0 中的 java.io.EOFException

转载 作者:可可西里 更新时间:2023-11-01 16:27:15 27 4
gpt4 key购买 nike

我正在使用以下方法从文件中读取字节:

FileSystem fs = config.getHDFS();
try {

Path path = new Path(dirName + '/' + fileName);

byte[] bytes = new byte[(int)fs.getFileStatus(path)
.getLen()];
in = fs.open(path);

in.read(bytes);
result = new DataInputStream(new ByteArrayInputStream(bytes));
} catch (Exception e) {
e.printStackTrace();
if (in != null) {
try {
in.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
}

我正在读取的目录中大约有 15,000 个文件。在某一点之后,我在 in.read(bytes) 行中得到了这个异常:

2012-05-31 14:11:45,477 [INFO:main] (DFSInputStream.java:414) - Failed to connect to /165.36.80.28:50010, add to deadNodes and continue
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:298)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:115)
at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:427)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:725)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514)
at java.io.DataInputStream.read(DataInputStream.java:83)

抛出的另一个异常是:

2012-05-31 15:09:14,849 [INFO:main] (DFSInputStream.java:414) - Failed to connect to /165.36.80.28:50010, add to deadNodes and continue
java.net.SocketException: No buffer space available (maximum connections reached?): connect
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:719)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514)
at java.io.DataInputStream.read(DataInputStream.java:83)

请告知可能是什么问题。

最佳答案

您忽略了 in.read 的返回值,并假设您可以一次读取整个文件。不要那样做。循环直到 read 返回 -1 或您已读取尽可能多的数据。我不清楚您是否真的应该像这样信任 getLen() - 如果文件在两次调用之间增长(或缩小)会怎样?

我建议创建一个 ByteArrayOutputStream 来写入和一个较小的(16K?)缓冲区作为临时存储,然后循环 - 读入缓冲区,将那么多字节写入输出流,泡沫, 冲洗,重复直到 read 返回 -1 表示流结束。然后,您可以像以前一样从 ByteArrayOutputStream 中获取数据并将其放入 ByteArrayInputStream 中。

编辑:快速代码,未经测试 - Guava 中有类似(更好)的代码, 顺便说一句。

public static byte[] readFully(InputStream stream) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[16 * 1024];
int bytesRead;
while ((bytesRead = stream.read(buffer)) > 0) {
baos.write(buffer, 0, bytesRead);
}
return baos.toByteArray();
}

然后只需使用:

in = fs.open(path);
byte[] data = readFully(in);
result = new DataInputStream(new ByteArrayInputStream(data));

另请注意,您应该在 finally block 中关闭您的流,而不仅仅是在异常时。我还建议不要捕获 Exception 本身。

关于HDFS 0.22.0 中的 java.io.EOFException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10839758/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com