gpt4 book ai didi

java - Hadoop Map Reduce - 将 Iterable 值写入上下文时,reduce 中的嵌套循环忽略文本结果

转载 作者:行者123 更新时间:2023-12-02 20:33:31 24 4
gpt4 key购买 nike

我是 hadoop 的新手,我试图在一个简单的输入文件上运行 map reduce(参见示例)。
我尝试使用两个 for 循环从属性列表中制作某种笛卡尔积,并且由于某种原因,我得到的结果值始终为空。
我试图用它来调整它,最终它只有在我在迭代它时设置结果 Text 时才起作用(我知道,这对我来说也很奇怪)。
如果您能帮助我理解问题,我将不胜感激,可能是我做错了什么。

这是我拥有的输入文件。

A 1
B 2
C 1
D 2
C 2
E 1

我想得到以下输出:
1 A-C, A-E, C-E
2 B-C, B-D, C-D

所以我尝试实现以下map reduce类:
公共(public)类 DigitToPairOfLetters {
    public static class TokenizerMapper
extends Mapper<Object, Text, Text, Text> {

private Text digit = new Text();
private Text letter = new Text();

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
letter.set(itr.nextToken());
digit.set(itr.nextToken());
context.write(digit, letter);
}
}
}

public static class DigitToLetterReducer
extends Reducer<Text, Text, Text, Text> {
private Text result = new Text();

public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
List<String> valuesList = new ArrayList<>();
for (Text value :values) {
valuesList.add(value.toString());
}
StringBuilder builder = new StringBuilder();
for (int i=0; i<valuesList.size(); i++) {
for (int j=i+1; j<valuesList.size(); j++) {
builder.append(valuesList.get(i)).append("
").append(valuesList.get(j)).append(",");
}
}
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "digit to letter");
job.setJarByClass(DigitToPairOfLetters.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(DigitToLetterReducer.class);
job.setReducerClass(DigitToLetterReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

但是这段代码将为我提供以下空列表的输出:
1
2

当我在 for 循环中添加结果集时,它似乎可以工作:
公共(public)类 DigitToPairOfLetters {
    public static class TokenizerMapper
extends Mapper<Object, Text, Text, Text> {

private Text digit = new Text();
private Text letter = new Text();

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
letter.set(itr.nextToken());
digit.set(itr.nextToken());
context.write(digit, letter);
}
}
}

public static class DigitToLetterReducer
extends Reducer<Text, Text, Text, Text> {
private Text result = new Text();

public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
List<String> valuesList = new ArrayList<>();
for (Text value :values) {
valuesList.add(value.toString());
// TODO: We set the valuesList in the result since otherwise the
hadoop process will ignore the values
// in it.
result.set(valuesList.toString());
}
StringBuilder builder = new StringBuilder();
for (int i=0; i<valuesList.size(); i++) {
for (int j=i+1; j<valuesList.size(); j++) {
builder.append(valuesList.get(i)).append("
").append(valuesList.get(j)).append(",");
// TODO: We set the builder every iteration in the loop since otherwise the hadoop process will
// ignore the values
result.set(builder.toString());
}
}
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "digit to letter");
job.setJarByClass(DigitToPairOfLetters.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(DigitToLetterReducer.class);
job.setReducerClass(DigitToLetterReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

这会给我以下结果:
1   [A C,A E,C E]
2 [B C,B D,C D]

我会很感激你的帮助

最佳答案

您的第一种方法似乎很好,您只需要添加以下行:

result.set(builder.toString());


context.write(key, result);

就像你在第二个函数中所做的那样。

Context.write 刷新输出,因为 result 只是一个空对象,没有任何值作为值传递,只有键被传递。因此,在传递之前,您需要将值(A-E 等)设置到结果中。

关于java - Hadoop Map Reduce - 将 Iterable<Text> 值写入上下文时,reduce 中的嵌套循环忽略文本结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52181442/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com