hadoop 警告 EBADF : Bad file descriptor-6ren

hadoop 警告 EBADF : Bad file descriptor

转载作者：可可西里更新时间：2023-11-01 14:17:01

我是 Hadoop 的新手，尝试使用 Hadoop 编写关系连接。该算法尝试在连续两轮中连接三个关系。我使用递归方法。该程序运行良好。但是在执行期间它会尝试打印这样的警告:

14/12/02 10:41:16 WARN io.ReadaheadPool: Failed readahead on ifile                                                                                                                  
EBADF: Bad file descriptor                                                                                                                                                          
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)                                                                                                
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:263)                                                                                   
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:142)                                                                  
        at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)                                                                                      
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)                                                                                          
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)                                                                                          
        at java.lang.Thread.run(Thread.java:745)

这很烦人，我想知道问题的原因以及如何摆脱它们。我的代码如下:

public class Recursive {  
    /**
     * Join three relations together using recursive method
     * R JOIN S JOIN T = ((R JOIN S) JOIN T)
     */
    static String[] relationSequence;           // Keeps sequence of relations in join
    static int round;                           // Round number running
    /**
     * Mapper
     * Relation name = R
     * Input tuple   = a    b
     * Output pair   = (b, (R,a))
     * We assume that join value is the last attribute for the first relation
     * and the first attribute for the second relation.
     * So using this assumption, this map-reduce algorithm will work for any number of attributes  
     */
    public static class joinMapper extends Mapper<Object, Text, IntWritable, Text>{
        public void map(Object keyIn, Text valueIn, Context context) throws IOException, InterruptedException {
            // Read tuple and put attributes in a string array
            String curValue = valueIn.toString();
            String[] values = curValue.split("\t");
            // Get relation name from input file name
            String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
            // Get join attribute index number R join S
            int joinIndex;
            String others = "";
            if(fileName.compareTo(relationSequence[round])==0){
                joinIndex = 0;
                others = curValue.substring(0+2);
            }else{
                joinIndex = values.length - 1;
                others = curValue.substring(0, curValue.length()-2);
            }
            IntWritable joinValue = new IntWritable(Integer.parseInt(values[joinIndex]));

            // Create list of attributes which are not join attribute
            Text temp = new Text(fileName + "|" + others);
            context.write(joinValue,temp);
        }
    }

    /**
     * Reducer
     * 
     *  1. Divide the input list in two ArrayLists based on relation name:
     *      a. first relation
     *      b. second relation
     *  2. Test if the second relation is not empty. If it's so, we shouldn't continue.
     *  3. For each element of the first array list, join it with the all elements in
     *     the second array list
     */
    public static class joinReducer extends Reducer<IntWritable, Text, Text, Text>{
        public void reduce(IntWritable keyIn, Iterable<Text> valueIn, Context context)
                throws IOException, InterruptedException{
            ArrayList<String> firstRelation = new ArrayList<String>();
            ArrayList<String> secondRelation = new ArrayList<String>();
            for (Text value : valueIn) {
                String[] values = value.toString().split("\\|");
                if(values[0].compareTo(relationSequence[round])==0){
                    secondRelation.add(values[1]);
                }else{
                    firstRelation.add(values[1]);
                }
            }
            if(secondRelation.size()>0){
                for (String firstItem : firstRelation) {
                    for (String secondItem : secondRelation) {
                        context.write(new Text(firstItem.toString()), new Text(keyIn.toString() + "\t"
                                                                            + secondItem.toString() 
                                                                            ));
                    }
                }
            }
        }

    }

    /**
     * Partitioner
     * 
     * In order to hash pairs to reducer tasks, we used logical which is 
     * obviously faster than module function.
     */
    public static class joinPartitioner extends Partitioner<IntWritable, Text> {
        public int getPartition(IntWritable key, Text value, int numReduceTasks) {
                int partitionNumber = key.get()&0x007F;
                return partitionNumber;
            }
     }

     /**
      * Main method
      * 
      * (R join S join T)
      * hadoop jar ~/COMP6521.jar Recursive /input/R /input/S /input2/T /output R,S,T
      * 
      * @param args
      * <br> args[0]: first relation
      * <br> args[1]: second relation
      * <br> args[2]: third relation
      * <br> args[3]: output directory
      * <br> args[4]: relation sequence to join, separated by comma
      */
    public static void main(String[] args) throws IllegalArgumentException, IOException, InterruptedException, ClassNotFoundException {
        long s = System.currentTimeMillis();
        /****** Preparing problem variables *******/
        relationSequence = args[4].split(",");      // Keep sequence of relations
        round = 1;                                  // Variable to keep current round number
        int maxOfReducers = 128;                    // Maximum number of available reducers
        int noReducers;                             // Number of reducers for one particular job
        noReducers = maxOfReducers;

        Path firstRelation  = new Path(args[0]);
        Path secondRelation = new Path(args[1]);
        Path thirdRelation  = new Path(args[2]);
        Path temp           = new Path("/temp");    // Temporary path to keep intermediate result
        Path out            = new Path(args[3]);
        /****** End of variable Preparing   *******/

        Configuration conf = new Configuration();

        /****** Configuring first job *******/
//      General configuration
        Job job = Job.getInstance(conf, "Recursive multi-way join (first round)");
        job.setNumReduceTasks(noReducers);

//      Pass appropriate classes
        job.setJarByClass(Recursive.class);
        job.setMapperClass(joinMapper.class);
        job.setPartitionerClass(joinPartitioner.class);
        job.setReducerClass(joinReducer.class);

//      Specify input and output type of reducers
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(Text.class);
        FileSystem fs = FileSystem.get(conf);
        if(fs.exists(temp)){ fs.delete(temp, true);}
        if(fs.exists(out)) { fs.delete(out, true); }

//      Specify the input and output paths  
        FileInputFormat.addInputPath(job, firstRelation);
        FileInputFormat.addInputPath(job, secondRelation);
        FileOutputFormat.setOutputPath(job, temp);
        /****** End of first job configuration *******/
        job.submit();
//      Running the first job
        boolean b = job.waitForCompletion(true);
        if(b){
//          try to execute the second job after completion of the first one
            round++;                    // Specify round number
            Configuration conf2 = new Configuration();  // Create new configuration object

            /****** Configuring second job *******/
//          General configuration
            Job job2 = Job.getInstance(conf2, "Reduce multi-way join (second round)");
            job2.setNumReduceTasks(noReducers);

//          Pass appropriate classes
            job2.setJarByClass(Recursive.class);
            job2.setMapperClass(joinMapper.class);
            job2.setPartitionerClass(joinPartitioner.class);
            job2.setReducerClass(joinReducer.class);

//          Specify input and output type of reducers
            job2.setOutputKeyClass(Text.class);
            job2.setOutputValueClass(Text.class);

//          Specify input and output type of mappers
            job2.setMapOutputKeyClass(IntWritable.class);
            job2.setMapOutputValueClass(Text.class);
//          End of 2014-11-25
//          Specify the input and output paths  
            FileInputFormat.addInputPath(job2, temp);
            FileInputFormat.addInputPath(job2, thirdRelation);
            FileOutputFormat.setOutputPath(job2, out);
            /****** End of second job configuration *******/
            job2.submit();
//          Running the first job
            b = job2.waitForCompletion(true);

//          Output time measurement
            long e = System.currentTimeMillis() - s;
            System.out.println("Total: " + e);
            System.exit(b ? 0 : 1);
        }
        System.exit(1);
    }

}

最佳答案

我有一个类似的错误，我结束了你的问题，这个 mail list thread EBADF: Bad file descriptor

To clarify a little bit, the readahead pool can sometimes spit out this message if you close a file while a readahead request is in flight. It's not an error and just reflects the fact that the file was closed hastily, probably because of some other bug which is the real problem.

在我的例子中，我关闭了一个 writer 而没有用 hflush

刷新它

由于您似乎没有手动使用 writer 或 reader，我可能会看看您是如何发送 mr 任务的。

关于hadoop 警告 EBADF : Bad file descriptor，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27253616/

文章推荐： windows - OpenCV 使用 CMake 在 Windows 上安装 opencv_contrib 模块

文章推荐： c++ - 在卡萨布兰卡获取 HTTP 请求的 IP

file - access to file to files tomcat的conf文件夹下的一个文件
我想知道是否可以访问放在 tomcat 的 conf 文件夹中的文件。通常我会在这个文件中放置多个 webapp 的配置，在 war 之外。我想使用类路径独立于文件系统。我过去使用过 lib 文件
PowerShell ForEach $file in $Files 中的每个 $file
我有一个 PowerShell 脚本，它获取文件列表并移动满足特定条件的文件。为什么即使对象为空，foreach 循环也会运行？我假设如果 $i 不存在，它就不会运行。但是如果 $filePath
java - File file = new File () 的路径错误
我已将 BasicAccountRule.drl 放置在我的 Web 应用程序中，位置为:C:/workspace/exim_design/src/main/resources/rules/drl/i
ruby - File.open ('file.txt' ) 与 File.open ('file.txt' ).readlines
我使用 File.open('file.txt').class 和 File.open('file.txt').readlines.class 以及前者进行了检查一个返回 File，后者返回 Arra
java - 即使 file.exists()、file.canRead()、file.canWrite()、file.canExecute() 都返回 true，file.delete() 也会返回 false
我正在尝试使用 FileOutputStream 删除文件，在其中写入内容后。这是我用来编写的代码: private void writeContent(File file, String fileC
python - FileNotFoundException :File file:/path/to/file/in. txt不存在或者运行Flink的用户没有足够的权限访问它
我正在尝试使用 flink 和 python 批处理 api 测试 Wordcount 经典示例。我的问题是，将数据源从 env.from_elements() 修改为 env.read_text()
c - 通过函数 : FILE* or FILE**? 的 FILE* 数组
我正在尝试制作一个可以同时处理多个不同文件的程序。我的想法是制作一个包含 20 个 FILE* 的数组，以便在我达到此限制时能够关闭其中一个并打开请求的新文件。为此，我想到了一个函数，它选择一个选项
linux - 狂欢 : Search Contents of File A in File B and Print lines of File A in File C
我有两个文件A和B文件A: 976464 792992 文件B TimeStamp,Record1,976464,8383,ABCD 我想搜索文件 A 和文件 B 中的每条记录并打印匹配的记录。打印的
java - 使用 Java 8 流将 Map 转换为 Map>
我有一些保存在 map 中的属性文件。示例: Map map = new HashMap<>(); map.put("1", "One"); map.put("2", "Two"); map.put(
file - Unix/庆典 : Reading A List of Files and Merge Them To A File
我正在尝试找出一个脚本文件，该文件接受一个包含文件列表的文件(每一行都是一个文件路径，即 path/to/file)并将它们合并到一个文件中。例如: list.text -- path/to/fil
c# - File.CreateText/File.AppendText 与 File.AppendAllText
为了使用 File.CreateText() 和 File.AppendText() 你必须: 通过调用这些方法之一打开流写消息关闭流处理流为了使用 File.AppendAllText()
Using rsync to rename files during copying with --files-from?(在复制过程中使用rsync重命名文件--files-from？)
使用rsync时，如何在使用--files-from参数复制时重命名文件？我有大约190，000个文件，在从源复制到目标时，每个文件都需要重命名。我计划将文件列表放在一个文本文件中传递给--files
java - "file:d:\\dir1\file.xml"和 "file:/d:\\dir1\file.xml"作为 FileSystemXmlApplicationContext 参数
我在非服务器应用程序中使用 Spring(只需从 Eclipse 中某个类的 main() 编译并运行它)。我的问题是作为 new FileSystemXmlApplicationContext 的
ksh - "test -a file"和 "test file -ef file"的区别
QNX (Neutrino 6.5.0) 使用 ksh 的开源实现作为其 shell 。许多提供的脚本，包括系统启动脚本，都使用诸如 if ! test /dev/slog -ef /dev/slog
PHP : Excel cannot open the file because the file format or file extension is not valid
当我尝试打开从我的应用程序下载的 xls 文件时，出现此错误: excel cannot open the file because the file format or file extension
c - "file pointer"、 "stream"、 "file descriptor"和... "file"之间的区别？
有一些相关的概念，即文件指针、流和文件描述符。我知道文件指针是指向数据类型 FILE 的指针(在例如 FILE.h 和 struct_FILE.h 中声明)。我知道文件描述符是 int ，例如成员
file - Groovy(文件IO): find all files and return all files - the Groovy way
好吧，这应该很容易... 我是groovy的新手，我希望实现以下逻辑: def testFiles = findAllTestFiles(); 到目前为止，我想出了下面的代码，该代码可以成功打印所有文
PowerShell:为什么 "Get-Content | Out-File -Append "会进入循环？
我理解为什么以下内容会截断文件的内容: Get-Content | Out-File 这是因为 Out-File 首先运行，它会在 Get-Content 有机会读取文件之前清空文件。但是当我尝
file - 类型错误 : invalid file: When trying to make a file name a variable
您好，我正在尝试将文件位置表示为变量，因为最终脚本将在另一台机器上运行。这是我尝试过的代码，然后是我得到的错误。在我看来，python 是如何添加“\”的，这就是导致问题的原因。如果是这种情况，我如何
bash - 一行文件的 "$(cat file)"、 "$(
我有一个只包含一行的输入文件: $ cat input foo bar 我想在我的脚本中使用这一行，据我所知有 3 种方法: line=$(cat input) line=$( input"...,

可可西里

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

hadoop 警告 EBADF : Bad file descriptor