gpt4 book ai didi

java - Hadoop MapReduce : context. 写入更改值

转载 作者:可可西里 更新时间:2023-11-01 16:37:27 25 4
gpt4 key购买 nike

我是 Hadoop 的新手,正在编写 MapReduce 作业,我遇到了一个问题,它似乎是 reducers context.write 方法正在将正确的值更改为不正确的值。

MapReduce 作业应该做什么?

  • 统计总字数(int wordCount)
  • 计算不同单词的数量(int counter_dist)
  • 统计以“z”或“Z”开头的单词数 (int counter_startZ)
  • 统计出现次数少于4次的单词(int counter_less4)

所有这些都必须在单个 MapReduce 作业中完成。

正在分析的文本文件

Hello how zou zou zou zou how are you

正确输出:
wordCount = 9
counter_dist = 5
counter_startZ = 4
counter_less4 = 4

映射器类

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String hasKey = itr.nextToken();
word.set(hasKey);
context.write(word, one);
}

}
}

reducer 类
为了调试我的代码,我打印了很多语句来检查每个点的值。下面提供了标准输出代码。

public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {

int wordCount = 0; // Total number of words
int counter_dist = 0; // Number of distinct words in the corpus
int counter_startZ = 0; // Number of words that start with letter Z
int counter_less4 = 0; // Number of words that appear less than 4

@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int repeatedWords = 0;
System.out.println("###Reduce method starts");
System.out.println("Values: wordCount:" + wordCount + " counter_dist:" + counter_dist + " counter_startZ:" + counter_startZ + " counter_less4:" + counter_less4 + " (start)");
for (IntWritable val : values){
System.out.println("Key: " + key.toString());
repeatedWords++;
wordCount += val.get();
if(key.toString().startsWith("z") || key.toString().startsWith("Z")){
counter_startZ++;
}
System.out.println("Values: wordCount:" + wordCount + " counter_dist:" + counter_dist + " counter_startZ:" + counter_startZ + " counter_less4:" + counter_less4 + " (end of loop)");
}
counter_dist++;

if(repeatedWords < 4){
counter_less4++;
}

System.out.println("Values: wordCount:" + wordCount + " counter_dist:" + counter_dist + " counter_startZ:" + counter_startZ + " counter_less4:" + counter_less4 + " repeatedWords:" + repeatedWords + " (end)");
System.out.println("###Reduce method ends\n");
}


@Override
public void cleanup(Context context) throws IOException, InterruptedException{
System.out.println("###CLEANUP: wordCount: " + wordCount);
System.out.println("###CLEANUP: counter_dist: " + counter_dist);
System.out.println("###CLEANUP: counter_startZ: " + counter_startZ);
System.out.println("###CLEANUP: counter_less4: " + counter_less4);

context.write(new Text("Total words: "), new IntWritable(wordCount));
context.write(new Text("Distinct words: "), new IntWritable(counter_dist));
context.write(new Text("Starts with Z: "), new IntWritable(counter_startZ));
context.write(new Text("Appears less than 4 times:"), new IntWritable(counter_less4));
}


}

Stdout 日志,我正在使用它进行调试

###Reduce method starts
Values: wordCount:0 counter_dist:0 counter_startZ:0 counter_less4:0 (start)
Key: Hello
Values: wordCount:1 counter_dist:0 counter_startZ:0 counter_less4:0 (end of loop)
Values: wordCount:1 counter_dist:1 counter_startZ:0 counter_less4:1 repeatedWords:1 (end)
###Reduce method ends

###Reduce method starts
Values: wordCount:1 counter_dist:1 counter_startZ:0 counter_less4:1 (start)
Key: are
Values: wordCount:2 counter_dist:1 counter_startZ:0 counter_less4:1 (end of loop)
Values: wordCount:2 counter_dist:2 counter_startZ:0 counter_less4:2 repeatedWords:1 (end)
###Reduce method ends

###Reduce method starts
Values: wordCount:2 counter_dist:2 counter_startZ:0 counter_less4:2 (start)
Key: how
Values: wordCount:3 counter_dist:2 counter_startZ:0 counter_less4:2 (end of loop)
Key: how
Values: wordCount:4 counter_dist:2 counter_startZ:0 counter_less4:2 (end of loop)
Values: wordCount:4 counter_dist:3 counter_startZ:0 counter_less4:3 repeatedWords:2 (end)
###Reduce method ends

###Reduce method starts
Values: wordCount:4 counter_dist:3 counter_startZ:0 counter_less4:3 (start)
Key: you
Values: wordCount:5 counter_dist:3 counter_startZ:0 counter_less4:3 (end of loop)
Values: wordCount:5 counter_dist:4 counter_startZ:0 counter_less4:4 repeatedWords:1 (end)
###Reduce method ends

###Reduce method starts
Values: wordCount:5 counter_dist:4 counter_startZ:0 counter_less4:4 (start)
Key: zou
Values: wordCount:6 counter_dist:4 counter_startZ:1 counter_less4:4 (end of loop)
Key: zou
Values: wordCount:7 counter_dist:4 counter_startZ:2 counter_less4:4 (end of loop)
Key: zou
Values: wordCount:8 counter_dist:4 counter_startZ:3 counter_less4:4 (end of loop)
Key: zou
Values: wordCount:9 counter_dist:4 counter_startZ:4 counter_less4:4 (end of loop)
Values: wordCount:9 counter_dist:5 counter_startZ:4 counter_less4:4 repeatedWords:4 (end)
###Reduce method ends

###CLEANUP: wordCount: 9
###CLEANUP: counter_dist: 5
###CLEANUP: counter_startZ: 4
###CLEANUP: counter_less4: 4

从日志来看,所有值似乎都是正确的,并且一切正常。但是,当我打开 HDFS 中的输出目录并读取“part-r-00000”文件时,写入那里的 context.write 的输出完全不同。

Total words: 22
Distinct words: 4
Starts with Z: 0
Appears less than 4 times: 4

最佳答案

您绝不能依赖cleanup() 方法来处理关键的程序逻辑。 cleanup() 方法在每次删除 JVM 时都会被调用。因此,根据生成和终止的 JVM 数量(您无法预测),您的逻辑会变得不稳定。

初始化和写入上下文都移到reduce方法中。

int wordCount = 0; // Total number of words
int counter_dist = 0; // Number of distinct words in the corpus
int counter_startZ = 0; // Number of words that start with letter Z
int counter_less4 = 0; // Number of words that appear less than 4

   context.write(new Text("Total words: "), new IntWritable(wordCount));
context.write(new Text("Distinct words: "), new IntWritable(counter_dist));
context.write(new Text("Starts with Z: "), new IntWritable(counter_startZ));
context.write(new Text("Appears less than 4 times:"), new IntWritable(counter_less4));

编辑:根据 OP 评论,似乎整个逻辑都有缺陷。

下面是实现预期结果的代码。 请注意,我还没有实现setup()cleanup();因为根本不需要。

使用计数器来计算您要查找的内容。 MapReduce 完成后,只需在驱动程序类中获取计数器即可。

例如字数以“z”或“Z”开头的字可以在映射器中计算

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String hasKey = itr.nextToken();
word.set(hasKey);
context.getCounter("my_counters", "TOTAL_WORDS").increment(1);
if(hasKey.toUpperCase().startsWith("Z")){
context.getCounter("my_counters", "Z_WORDS").increment(1);
}
context.write(word, one);
}
}
}

reducer 计数器可以统计不同单词的数量单词出现次数少于 4 次

public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {

@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int wordCount= 0;
context.getCounter("my_counters", "DISTINCT_WORDS").increment(1);
for (IntWritable val : values){
wordCount += val.get();
}
if(wordCount < 4{
context.getCounter("my_counters", "WORDS_LESS_THAN_4").increment(1);
}
}
}

在 Driver 类中获取计数器。以下代码位于您提交作业的行之后

CounterGroup group = job.getCounters().getGroup("my_counters");

for (Counter counter : group) {
System.out.println(counter.getName() + "=" + counter.getValue());
}

关于java - Hadoop MapReduce : context. 写入更改值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49140121/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com