gpt4 book ai didi

java - Hadoop MapReduce的结果未将任何数据写入输出文件

转载 作者:行者123 更新时间:2023-12-02 21:07:54 26 4
gpt4 key购买 nike

我已经尝试调试此错误已有一段时间了。基本上,我已经确认我的reduce类正在向其上下文写入正确的输出,但是由于某种原因,我总是得到零字节的输出文件。

我的映射器类:

public class FrequencyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {


public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
Document t = Jsoup.parse(value.toString());
String text = t.body().text();
String[] content = text.split(" ");

for (String s : content) {
context.write(new Text(s), new IntWritable(1));
}
}

}

我的 reducer 类:
public class FrequencyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int n = 0;
for (IntWritable i : values) {
n++;
}
if (n > 5) { // Do we need this check?
context.write(key, new IntWritable(n));
System.out.println("<" + key + ", " + n + ">");
}
}

}

和我的司机:
public class FrequencyMain {

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration(true);

// setup the job
Job job = Job.getInstance(conf, "FrequencyCount");
job.setJarByClass(FrequencyMain.class);

job.setMapperClass(FrequencyMapper.class);
job.setCombinerClass(FrequencyReducer.class);
job.setReducerClass(FrequencyReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);
}

}

由于某种原因,“减少输出记录”总是
Job complete: job_local805637130_0001
Counters: 17
Map-Reduce Framework
Spilled Records=250
Map output materialized bytes=1496
Reduce input records=125
Map input records=6
SPLIT_RAW_BYTES=1000
Map output bytes=57249
Reduce shuffle bytes=0
Reduce input groups=75
Combine output records=125
Reduce output records=0
Map output records=5400
Combine input records=5400
Total committed heap usage (bytes)=3606577152
File Input Format Counters
Bytes Read=509446
FileSystemCounters
FILE_BYTES_WRITTEN=385570
FILE_BYTES_READ=2909134
File Output Format Counters
Bytes Written=8

最佳答案

(假设您的目标是打印频率> 5的单词频率)

组合器的当前实现完全破坏了程序的语义。您需要删除它或重新实现:

  • 当前,它仅将那些频率至少为5的单词传递给 reducer 。Combiner每个映射器均可工作,这意味着,例如,如果仅单个文档被调度到某个映射器中,则该映射器/组合器将不会发出单词。该文档中的频率小于6(即使其他映射器中的其他文档中出现这些单词的频率很高)。您需要在合并器中删除check n > 5(但在reducer中则不需要)。
  • 因为现在 reducer 输入值不一定是全“1”,所以应按值量而不是n增加n++的值。
  • 关于java - Hadoop MapReduce的结果未将任何数据写入输出文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41218169/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com