java - Hadoop stdout始终为空，写入的字节为零-6ren

java - Hadoop stdout始终为空，写入的字节为零

转载作者：行者123 更新时间：2023-12-02 21:39:48

我正在尝试在MapReduce上执行Weka并且标准输出始终为空

这是运行整个程序的类。它负责
为了从用户那里获得输入，设置映射器和化简器，
组织weka输入等

public class WekDoop {



     * The main method of this program. 
     * Precondition: arff file is uploaded into HDFS and the correct
     * number of parameters were passed into the JAR file when it was run
     * 
     * @param args
     * @throws Exception
     */
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        // Make sure we have the correct number of arguments passed into the program
        if (args.length != 4) {
          System.err.println("Usage: WekDoop <# of splits> <classifier> <input file> <output file>");
          System.exit(1);
        }

        // configure the job using the command line args
        conf.setInt("Run-num.splits", Integer.parseInt(args[0]));
        conf.setStrings("Run.classify", args[1]);
        conf.set("io.serializations", "org.apache.hadoop.io.serializer.JavaSerialization," + "org.apache.hadoop.io.serializer.WritableSerialization");

        // Configure the jobs main class, mapper and reducer
        // TODO: Make the Job name print the name of the currently running classifier
        Job job = new Job(conf, "WekDoop");
        job.setJarByClass(WekDoop.class);
        job.setMapperClass(WekaMap.class);
        job.setReducerClass(WekaReducer.class);

        // Start with 1
        job.setNumReduceTasks(1);

        // This section sets the values of the <K2, V2>
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(weka.classifiers.bayes.NaiveBayes.class);
        job.setOutputValueClass(AggregateableEvaluation.class);

        // Set the input and output directories based on command line args
        FileInputFormat.addInputPath(job, new Path(args[2]));
        FileOutputFormat.setOutputPath(job, new Path(args[3]));

        // Set the input type of the environment
        // (In this case we are overriding TextInputFormat)
        job.setInputFormatClass(WekaInputFormat.class);

        // wait until the job is complete to exit
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

映射器类

此类是weka分类器的映射器
给它提供了数据块，并设置了一个分类器以在该数据上运行。
该方法中还发生许多其他处理。

    public  class WekaMap extends Mapper<Object, Text, Text, AggregateableEvaluation> {
    private Instances randData = null;
    private Classifier cls = null;

    private AggregateableEvaluation eval = null;
    private Classifier clsCopy = null;

    // Run 10 mappers
    private String numMaps = "10";

    // TODO: Make sure this is not hard-coded -- preferably a command line arg
    // Set the classifier
    private String classname = "weka.classifiers.bayes.NaiveBayes";
    private int seed = 20;

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        System.out.println("CURRENT LINE: " + line);

        //line = "/home/ubuntu/Workspace/hadoop-1.1.0/hadoop-data/spambase_processed.arff";

        Configuration conf = new Configuration();
        FileSystem fileSystem = FileSystem.get(conf);

        Path path = new Path("/home/hduser/very_small_spam.arff");

        // Make sure the file exists...
        if (!fileSystem.exists(path)) {
            System.out.println("File does not exists");
            return;
        }

        JobID test = context.getJobID();
        TaskAttemptID tid = context.getTaskAttemptID();

        // Set up the weka configuration
        Configuration wekaConfig = context.getConfiguration();
        numMaps = wekaConfig.get("Run-num.splits");
        classname = wekaConfig.get("Run.classify");

        String[] splitter = tid.toString().split("_");
        String jobNumber = "";
        int n = 0;

        if (splitter[4].length() > 0) {
            jobNumber = splitter[4].substring(splitter[4].length() - 1);
            n = Integer.parseInt(jobNumber);
        }

        FileSystem fs = FileSystem.get(context.getConfiguration());

        System.out.println("PATH: " + path);

        // Read in the data set
        context.setStatus("Reading in the arff file...");
        readArff(fs, path.toString());
        context.setStatus("Done reading arff! Initializing aggregateable eval...");

        try {
            eval = new AggregateableEvaluation(randData);
        }
        catch (Exception e1) {
            e1.printStackTrace();
        }

        // Split the data into two sets: Training set and a testing set
        // this will allow us to use a little bit of data to train the classifier
        // before running the classifier on the rest of the dataset
        Instances trainInstance = randData.trainCV(Integer.parseInt(numMaps), n);
        Instances testInstance = randData.testCV(Integer.parseInt(numMaps), n);

        // Set parameters to be passed to the classifiers
        String[] opts = new String[3];
        if (classname.equals("weka.classifiers.lazy.IBk")) {
            opts[0] = "";
            opts[1] = "-K";
            opts[2] = "1";
        }
        else if (classname.equals("weka.classifiers.trees.J48")) {
            opts[0] = "";
            opts[1] = "-C";
            opts[2] = "0.25";
        }
        else if (classname.equals("weka.classifiers.bayes.NaiveBayes")) {
            opts[0] = "";
            opts[1] = "";
            opts[2] = "";
        }
        else {
            opts[0] = "";
            opts[1] = "";
            opts[2] = "";
        }

        // Start setting up the classifier and its various options
        try {
          cls = (Classifier) Utils.forName(Classifier.class, classname, opts);
        }
        catch (Exception e) {
            e.printStackTrace();
        }

        // These are all used for timing different processes
        long beforeAbstract = 0;
        long beforeBuildClass = 0;
        long afterBuildClass = 0;
        long beforeEvalClass = 0;
        long afterEvalClass = 0;

        try {
            // Create the classifier and record how long it takes to set up 
            context.setStatus("Creating the classifier...");
            System.out.println(new Timestamp(System.currentTimeMillis()));
            beforeAbstract = System.currentTimeMillis();
            clsCopy = AbstractClassifier.makeCopy(cls);
            beforeBuildClass = System.currentTimeMillis();
            System.out.println(new Timestamp(System.currentTimeMillis()));

            // Train the classifier on the training set and record how long this takes
            context.setStatus("Training the classifier...");
            clsCopy.buildClassifier(trainInstance);
            afterBuildClass = System.currentTimeMillis();
            System.out.println(new Timestamp(System.currentTimeMillis()));
            beforeEvalClass = System.currentTimeMillis();

            // Run the classifer on the rest of the data set and record its duration as well
            context.setStatus("Evaluating the model...");
            eval.evaluateModel(clsCopy, testInstance);
            afterEvalClass = System.currentTimeMillis();
            System.out.println(new Timestamp(System.currentTimeMillis()));

            // We are done this iteration!
            context.setStatus("Complete");
        }
        catch (Exception e) {
            System.out.println("Debugging strarts here!");
            e.printStackTrace();
        }

        // calculate the total times for each section
        long abstractTime = beforeBuildClass - beforeAbstract;
        long buildTime = afterBuildClass - beforeBuildClass;
        long evalTime = afterEvalClass - beforeEvalClass;

        // Print out the times
        System.out.println("The value of creation time: " + abstractTime);
        System.out.println("The value of Build time: " + buildTime);
        System.out.println("The value of Eval time: " + evalTime);

        context.write(new Text(line), eval);
      }

    /**
     * This can be used to write out the results on HDFS, but it is not essential
     * to the success of this project. If time allows, we can implement it.
     */
      public void writeResult() {    

      }


      /**
       * This method reads in the arff file that is provided to the program.
       * Nothing really special about the way the data is handled.
       * 
       * @param fs
       * @param filePath
       * @throws IOException
       * @throws InterruptedException
       */
      public void readArff(FileSystem fs, String filePath) throws IOException, InterruptedException {
          BufferedReader reader;
          DataInputStream d;
          ArffReader arff;
          Instance inst;
          Instances data;

          try {
              // Read in the data using a ton of wrappers
              d = new DataInputStream(fs.open(new Path(filePath)));
              reader = new BufferedReader(new InputStreamReader(d));
              arff = new ArffReader(reader, 100000);
              data = arff.getStructure();
              data.setClassIndex(data.numAttributes() - 1);

              // Add each line to the input stream
              while ((inst = arff.readInstance(data)) != null) {
                  data.add(inst);
              }

              reader.close();

              Random rand = new Random(seed);
              randData = new Instances(data);
              randData.randomize(rand);

              // This is how weka handles the sampling of the data
              // the stratify method splits up the data to cross validate it
              if (randData.classAttribute().isNominal()) {
                  randData.stratify(Integer.parseInt(numMaps));
              }
          }
          catch (IOException e) {
              e.printStackTrace();
          }
    }
}

reducer 类

此类是weka分类器输出的简化器
从映射器和它的映射器中获得了一堆交叉验证的数据块
工作是将数据聚合到一个解决方案中。

 public  class WekaReducer extends Reducer<Text, AggregateableEvaluation, Text, IntWritable> {
     Text result = new Text();
     Evaluation evalAll = null;
     IntWritable test = new IntWritable();

     AggregateableEvaluation aggEval;

    /**
     * The reducer method takes all the stratified, cross-validated
     * values from the mappers in a list and uses an aggregatable evaluation to consolidate
     * them.
     */
    public void reduce(Text key, Iterable<AggregateableEvaluation> values, Context context) throws IOException, InterruptedException {      
        int sum = 0;

        // record how long it takes to run the aggregation
        System.out.println(new Timestamp(System.currentTimeMillis()));
        long beforeReduceTime = System.currentTimeMillis();

        // loop through each of the values and "aggregate"
        // which basically means to consolidate the values
        for (AggregateableEvaluation val : values) {
            System.out.println("IN THE REDUCER!");

            // The first time through, give aggEval a value
            if (sum == 0) {
                try {
                    aggEval = val;
                }
                catch (Exception e) {
                    e.printStackTrace();
                }
            }
            else {
                // combine the values
                aggEval.aggregate(val);
            }

            try {
                // This is what is taken from the mapper to be aggregated
                System.out.println("This is the map result");
                System.out.println(aggEval.toMatrixString());
            }
            catch (Exception e) {
                e.printStackTrace();
            }                       

            sum += 1;
        }

        // Here is where the typical weka matrix output is generated
        try {
            System.out.println("This is reduce matrix");
            System.out.println(aggEval.toMatrixString());
        }
        catch (Exception e) {
            e.printStackTrace();
        }

        // calculate the duration of the aggregation
        context.write(key, new IntWritable(sum));
        long afterReduceTime = System.currentTimeMillis();
        long reduceTime = afterReduceTime - beforeReduceTime;

        // display the output
        System.out.println("The value of reduce time is: " + reduceTime);
        System.out.println(new Timestamp(System.currentTimeMillis()));
    }
}

最后是InputFormatClass

取得一个JobContext并返回分成几部分的数据列表
基本上，这是处理大型数据集的一种方式。这种方法允许
我们将大数据集拆分为较小的块，以跨工作节点传递
(或者在我们的例子中，只是为了使生活更轻松一些，并将这些块传递给一个
节点，这样它就不会被一个大数据集淹没)

    public class WekaInputFormat extends TextInputFormat {

    public List<InputSplit> getSplits(JobContext job) throws IOException {
        long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));
        long maxSize = getMaxSplitSize(job);

        List<InputSplit> splits = new ArrayList<InputSplit>();
        for (FileStatus file: listStatus(job)) {
            Path path = file.getPath();
            FileSystem fs = path.getFileSystem(job.getConfiguration());

            //number of bytes in this file
            long length = file.getLen();
            BlockLocation[] blkLocations = fs.getFileBlockLocations(file, 0, length);

            // make sure this is actually a valid file
            if(length != 0) {
                // set the number of splits to make. NOTE: the value can be changed to anything
                int count = job.getConfiguration().getInt("Run-num.splits", 1);
                for(int t = 0; t < count; t++) {
                    //split the file and add each chunk to the list
                    splits.add(new FileSplit(path, 0, length, blkLocations[0].getHosts())); 
                }
            }
            else {
                // Create empty array for zero length files
                splits.add(new FileSplit(path, 0, length, new String[0]));
            }
        }
        return splits;
    }
}

最佳答案

对于每个映射器，reducer和整体作业，都有一个stderr文件，stdout文件和syslog文件。

您正在打印到映射器和化简器中的stdout，因此您应该检查映射器和化简器的stdout文件而不是整个作业的文件。

祝你好运

关于java - Hadoop stdout始终为空，写入的字节为零，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29586423/

文章推荐： hadoop - Mahout:为什么我们将输入数据转换为序列文件？

文章推荐： playframework-2.0 - Play 2 中的 CSRF 过滤器实现

文章推荐： java - VisualVM 无法打开本地 VM 进行分析

文章推荐： regex - 在Pig中使用正则表达式解析日志文件

c# - 字节 + 字节 = 未知结果
美好的一天!我试图添加两个字节变量并注意到奇怪的结果。 byte valueA = 255; byte valueB = 1; byte valueC = (byte)(valueA + valueB
ios - 转换[字节]？到[字节]
嗨，我是 swift 的新手，我正在尝试解码以 [Byte] 形式发回给我的字节数组？当我尝试使用 if let string = String(bytes: d, encoding: .utf8)
postgresql - 由于 IPV6 需要 128 位(16 字节)那么为什么在 postgres CIDR 数据类型中存储为 24 字节(8.1)和 19 字节(9.1)？
我正在使用 ipv4 和 ipv6 存储在 postgres 数据库中。因为 ipv4 需要 32 位(4 字节)而 ipv6 需要 128(16 字节)位。那么为什么在 postgres 中 CI
string - []字节(字符串)与[]字节(*字符串)
我很好奇为什么 Go 不提供 []byte(*string) 方法。从性能的角度来看，[]byte(string) 不会复制输入参数并增加更多成本(尽管这看起来很奇怪，因为字符串是不可变的，为什么要复
客户端发送 500 字节，但服务器接收 244 字节 - 套接字编程？
我正在尝试为UDP实现Stop-and-Wait ARQ。根据停止等待约定，我在 0 和 1 之间切换 ACK。正确的 ACK 定义为正确的序列号(0 或 1)AND消息长度。以下片段是我的代码的
php - filesize() 始终读取 0 字节，即使文件大小不是 0 字节
我在下面写了一些代码，目前我正在测试，所以代码中没有数据库查询。下面的代码显示 if(filesize($filename) != 0) 总是转到 else，即使文件不是 0 字节而是 16 字节那
java - 无法读取整个 header ；读取 0 字节；预计 512 字节
我使用 Apache poi 3.8 来读取 xls 文件，但出现异常: java.io.IOException: Unable to read entire header; 0 by
python - 为什么在调用 .clear() 后字典大小为 72 字节，而实例化时为 240 字节？
字典大小为 72 字节(根据 getsizeof(dict) 在字典上调用 .clear() 之后发生了什么，当新实例化的字典返回 240 字节时？我知道一个简单的 dict 的起始大小为“8”，并
c - 将 4 字节 int 交织到 8 字节 int
我目前正在努力创建一个函数，它接受两个 4 字节无符号整数，并返回一个 8 字节无符号长整数。我试图将我的工作基于 this research 描述的方法，但我的所有尝试都没有成功。我正在处理的具体输
c++ - 将 4 字节 int 解释为 4 字节 float
看看这个简单的程序: #include using namespace std; int main() { unsigned int i=0x3f800000; float* p=(float*)(
java - Java 中的字符串 "8000000000000000"(16 字节)相当于 "BCD"(8 字节)
我创建了自己的函数，将一个字符串转换为其等效的 BCD 格式的 bytes[]。然后我将此字节发送到 DataOutputStram (使用需要 byte[] 数组的写入方法)。问题出在数字字符串“8
c - 带有静态堆的小块内存分配器(典型值 <= 16 字节，稀有值 >= 64 字节，最大值 = 192)
此分配器将在具有静态内存的嵌入式系统中使用(即，没有可用的系统堆，因此“堆”将只是“char heap[4096]”) 周围似乎有很多“小型内存分配器”，但我正在寻找能够处理非常小的分配的一个。我说的
sql-server - 警告!最大 key 长度为 900 字节。索引的最大长度为 1000 字节
我将数据库脚本从 64 位系统传输到 32 位系统。当我执行脚本时，出现以下错误， Warning! The maximum key length is 900 bytes. The index 'U
linux - 128 字节 Ext2 和 256 字节 Ext3 的 inode 数据结构差异
想知道 128 字节 ext2 和 256 字节 ext3 文件系统之间的 inode 数据结构差异。我一直在为 ext2、128 字节 inode 使用此引用:http://www.nongnu.
java - Cassandra = 内存/编码- key 占用空间(哈希/字节[]=>十六进制=>UTF16=>字节[])
我试图理解使用 MD5 哈希作为 Cassandra key 在“内存/存储消耗”方面的含义: 我的内容(在 Java 中)的 MD5 哈希 = byte[] 长 16 个字节。 (16 字节来自维基
linux - 需要帮助 - 出现错误 : xrealloc: subst. c:4072: 无法重新分配 1073741824 字节(已分配 0 字节)
检查其他人是否也遇到类似问题。 shell脚本中的代码: ## Convert file into Unix format first. ## THIS is IMPORTANT. ###
c++ - x86 4 字节 float 与 8 字节 double (与 long long 相比)？
我们有一个测量数据处理应用程序，目前所有数据都保存为 C++ float，这意味着在我们的 x86/Windows 平台上为 32 位/4 字节。 (32 位 Windows 应用程序)。由于精度成
java - Long 的大小为 8 字节，那么在 JAVA 中如何将 'promoted' 转换为 float (4 字节)？
我读到在 Java 中 long 类型可以提升为 float 和 double ( http://www.javatpoint.com/method-overloading-in-java )。我想问
python - 将 n 个元素(大小 = 2 字节，十进制)的列表拆分为 2n 个元素(大小 = 1 字节，十六进制)
我有一个包含 n 个十进制元素的列表，其中每个元素都是两个字节长。可以说: x = [9000 , 5000 , 2000 , 400] 这个想法是将每个元素拆分为 MSB 和 LSB 并将其存储在
1 个 block (16 字节)的 Java AES-128 加密返回 2 个 block (32 字节)作为输出
我使用以下代码进行 AES-128 加密来编码一个 16 字节的 block ，但编码值的长度给出了 2 个 32 字节的 block 。我错过了什么吗？ plainEnc = AES.enc

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - Hadoop stdout始终为空，写入的字节为零