gpt4 book ai didi

hadoop - Hadoop作业引发java.io.IOException:尝试从封闭流中读取

转载 作者:行者123 更新时间:2023-12-02 21:55:50 25 4
gpt4 key购买 nike

我正在运行一个简单的map-reduce工作。此作业使用来自常见爬网数据的250个文件。

例如s3:// aws-publicdatasets / common-crawl / parse-output / segment / 1341690169105 /

如果使用50、100个文件,一切正常。但是用250个文件我得到这个错误

java.io.IOException: Attempted read from closed stream.
at org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:159)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:107)
at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:76)
at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:136)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:111)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1707)
at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1773)
at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1849)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:74)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:180)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268)

有什么线索吗?

最佳答案

您必须处理多少个 map 位?接近100吗?

这是一个猜测,但是在处理第一批文件时,与S3的连接可能超时,并且随着插槽可用于处理其他文件,该连接不再打开。我相信来自NativeS3FileSystem的超时错误会显示为IOExceptions。

关于hadoop - Hadoop作业引发java.io.IOException:尝试从封闭流中读取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14203621/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com