gpt4 book ai didi

java - 为什么 reducer 在我的情况下不能正常工作?

转载 作者:可可西里 更新时间:2023-11-01 15:43:16 25 4
gpt4 key购买 nike

为什么“set”只有一个元素,而例如前 5 行具有相同 URL 和四个不同 IP 的输入应该有 4 个元素。我还使用了“for-each”而不是“迭代器”,但不起作用。有人可以帮助我吗?

映射器

public class WordCount {

public static class TokenizerMapper extends Mapper<Object, Text, Text, Text> {

private Text IP = new Text();
private Text word = new Text();

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] tokens = line.split(",");
word.set(tokens[2]);
IP.set(tokens[0]);
context.write(word, IP);
}
}

reducer

    public static class IntSumReducer extends Reducer<Text, Text, Text, Text> {

public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
Set<String> set = new HashSet<String>();
Iterator<Text> iterator = values.iterator();
while (iterator.hasNext()) {
set.add(iterator.next().toString());
}
int a = set.size();
String str = String.format("%d", a);
context.write(key, new Text(str));
}
}

工作

    public static void main(String[] args) throws Exception {
Job job = new Job();
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

输入

"10.131.0.1","[29/Nov/2017:14:31:33","GET / HTTP/1.1","200"
"10.131.0.2","[29/Nov/2017:14:31:38","GET / HTTP/1.1","200"
"10.131.0.3","[29/Nov/2017:14:31:56","GET / HTTP/1.1","200"
"10.131.0.4","[29/Nov/2017:14:32:02","GET / HTTP/1.1","404"
"10.131.0.5","[29/Nov/2017:16:31:39","GET / HTTP/1.1","200"
"10.131.0.1","[29/Nov/2017:14:05:35","GET /contest.php HTTP/1.1","200"
"10.131.0.2","[29/Nov/2017:14:05:38","GET /contest.php HTTP/1.1","200"
"10.131.0.3","[29/Nov/2017:14:05:50","GET /contest.php HTTP/1.1","404"
"10.131.0.1","[29/Nov/2017:13:51:41","GET /login.php HTTP/1.1","200"
"10.131.0.2","[29/Nov/2017:13:51:49","GET /login.php HTTP/1.1","200"
"10.131.0.1","[29/Nov/2017:13:51:46","GET /contestproblem.php?name=RUET%20OJ%20Server%20Testing%20Contest HTTP/1.1","200"
"10.131.0.8","[29/Nov/2017:13:51:46","GET /contestproblen.php?name=RUET%20OJ%20Server%20Testing%20Contest HTTP/1.1","200"

我的结果是

"GET / HTTP/1.1"    1
"GET /contest.php HTTP/1.1" 1
"GET /contestproblem.php?name=RUET%20OJ%20Server%20Testing%20Contest HTTP/1.1" 1
"GET /contestproblen.php?name=RUET%20OJ%20Server%20Testing%20Contest HTTP/1.1" 1
"GET /login.php HTTP/1.1" 1

最佳答案

Reducer 工作正常,但 Combiner 没有按照您的想法进行。 Combiner 打开时发生的情况是:

映射器输出:

("GET / HTTP/1.1", "10.31.0.1")
("GET / HTTP/1.1", "10.31.0.2")

合成器输入:

("GET / HTTP/1.1", {"10.31.0.1", "10.31.0.2"})

合成器输出:

("GET / HTTP/1.1", "2") //You have the right answer here...

reducer 输入:

("GET / HTTP/1.1", {"2"}) //...but then it gets passed into the Reducer again

reducer 输出:

("GET / HTTP/1.1", "1")

只有一个元素进入 Reducer,因此它减少到“1”。

移除 Combiner(删除 job.setCombinerClass(IntSumReducer.class); 即可。

其他建议的更改:

  1. 有 Reducer 输出 IntWritable而不是将数字转换为 Text .
  2. 制作Set一个Set<Text>而不是 Set<String> ,为了省钱Text -> String转化。

关于java - 为什么 reducer 在我的情况下不能正常工作?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55942266/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com