java - Hadoop Map Reduce - 将 Iterable<Text> 值写入上下文时，reduce 中的嵌套循环忽略文本结果-6ren

java - Hadoop Map Reduce - 将 Iterable 值写入上下文时，reduce 中的嵌套循环忽略文本结果

转载作者：行者123 更新时间：2023-12-02 20:33:31

我是 hadoop 的新手，我试图在一个简单的输入文件上运行 map reduce(参见示例)。
我尝试使用两个 for 循环从属性列表中制作某种笛卡尔积，并且由于某种原因，我得到的结果值始终为空。
我试图用它来调整它，最终它只有在我在迭代它时设置结果 Text 时才起作用(我知道，这对我来说也很奇怪)。
如果您能帮助我理解问题，我将不胜感激，可能是我做错了什么。

这是我拥有的输入文件。

A 1
B 2
C 1
D 2
C 2
E 1

我想得到以下输出:

1 A-C, A-E, C-E
2 B-C, B-D, C-D

所以我尝试实现以下map reduce类:
公共(public)类 DigitToPairOfLetters {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, Text> {

        private Text digit = new Text();
        private Text letter = new Text();

        public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                letter.set(itr.nextToken());
                digit.set(itr.nextToken());
                context.write(digit, letter);
            }
        }
    }

    public static class DigitToLetterReducer
            extends Reducer<Text, Text, Text, Text> {
        private Text result = new Text();

        public void reduce(Text key, Iterable<Text> values,
                Context context
                ) throws IOException, InterruptedException {
            List<String> valuesList = new ArrayList<>();
            for (Text value :values) {
                valuesList.add(value.toString());
            }
            StringBuilder builder = new StringBuilder();
            for (int i=0; i<valuesList.size(); i++) {
                for (int j=i+1; j<valuesList.size(); j++) {
                    builder.append(valuesList.get(i)).append(" 
").append(valuesList.get(j)).append(",");
                }
            }
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "digit to letter");
        job.setJarByClass(DigitToPairOfLetters.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(DigitToLetterReducer.class);
        job.setReducerClass(DigitToLetterReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

但是这段代码将为我提供以下空列表的输出:

1
2

当我在 for 循环中添加结果集时，它似乎可以工作:
公共(public)类 DigitToPairOfLetters {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, Text> {

        private Text digit = new Text();
        private Text letter = new Text();

        public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                letter.set(itr.nextToken());
                digit.set(itr.nextToken());
                context.write(digit, letter);
            }
        }
    }

    public static class DigitToLetterReducer
            extends Reducer<Text, Text, Text, Text> {
        private Text result = new Text();

        public void reduce(Text key, Iterable<Text> values,
                Context context
                ) throws IOException, InterruptedException {
            List<String> valuesList = new ArrayList<>();
            for (Text value :values) {
                valuesList.add(value.toString());
                // TODO: We set the valuesList in the result since otherwise the 
hadoop process will ignore the values
                // in it.
                result.set(valuesList.toString());
            }
            StringBuilder builder = new StringBuilder();
            for (int i=0; i<valuesList.size(); i++) {
                for (int j=i+1; j<valuesList.size(); j++) {
                    builder.append(valuesList.get(i)).append(" 
").append(valuesList.get(j)).append(",");
                    // TODO: We set the builder every iteration in the loop since otherwise the hadoop process will
                    // ignore the values
                    result.set(builder.toString());
                }
            }
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "digit to letter");
        job.setJarByClass(DigitToPairOfLetters.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(DigitToLetterReducer.class);
        job.setReducerClass(DigitToLetterReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

这会给我以下结果:

1   [A C,A E,C E]
2   [B C,B D,C D]

我会很感激你的帮助

最佳答案

您的第一种方法似乎很好，您只需要添加以下行:

result.set(builder.toString());

前

context.write(key, result);

就像你在第二个函数中所做的那样。

Context.write 刷新输出，因为 result 只是一个空对象，没有任何值作为值传递，只有键被传递。因此，在传递之前，您需要将值(A-E 等)设置到结果中。

关于java - Hadoop Map Reduce - 将 Iterable<Text> 值写入上下文时，reduce 中的嵌套循环忽略文本结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52181442/

文章推荐： hadoop - Hadoop:交换DataNode和NameNode而不丢失任何HDFS数据

文章推荐： docker - Docker容器不列出我拉出的容器吗？

文章推荐： Docker 镜像随着时间的推移而消失

java 正则表达式匹配 &[text(text - text text) !text]
我目前正在创建一个正则表达式来拆分所有匹配以下格式的字符串:&[text(text - text text) !text]。这里的文本实际上可以是任何字符。并且间距很重要。文本将如图所示列出。我已经
javascript - 使用正则表达式将 (,,,text,,4,text,3,,) 转换为 (text,4,text,3)
这个问题在这里已经有了答案: Remove duplicate commas and extra commas at start/end with RegExp in Javascript, and
Python xml 迷你。生成 Some text 元素
我有以下代码。 from xml.dom.minidom import Document doc = Document() root = doc.createElement('root') doc.a
javascript - 如何使用 jQuery :contains(some text) selector but only select "some text" from "this is some text"?
这个问题在这里已经有了答案: 关闭 10 年前。 Possible Duplicate: Find text string in jQuery and make it bold 如何使用 jQuer
javascript - libmagic。 text/plain 而不是 text/javascript text/css
我使用 libmagic 在我的元素的 Web 界面中获取文件的 MIME 类型。我在 css 和 js 文件上得到文本/纯 mime 类型。例如 chromium 显示以下警告: Resource
html - 如何设置
s inline : text, img, text, text
起初我必须阅读很多教程，但我仍然不知道我做错了什么...... 我想内联使用 4 个 div。在我想放置的那些 div 中:文本、图像、文本、文本。我希望中间文本自动设置为最大宽度。我写了一个简单的
javascript - 替换每次出现的 [b : "text"] to text where text can be anything
我想替换所有出现的 [b: "text"]至text使用 JavaScript 和 RegEx。目前我知道如何替换 [b: ""]至使用'/\[b: ""\]/g'但我不知道如果 " 之间有文本该怎么
text - 使用 text() 向绘图添加文本的替代方法
这可能是一个幼稚的问题，但我想知道是否有比使用 text() 更好的方法将文本添加到绘图中。注意，我也在使用 layout()以及。具体来说，我有一个情节的一部分，我想在其中添加一些带有标题的文本，然
text - 批量查找并替换Sublime Text 2
我必须反复从 latex 源粘贴代码，因此每次都必须做很多查找和替换操作('“a'=>'ä'，'” o'=>'ö'，...) 。有没有一种方法可以存储这些搜索和替换规则，例如，我可以通过一次按键执行
text - 为什么在编写代码时Sublime Text 3不会跳行？
当我在Sublime Text 3代码屏幕中编写代码时，它连续地向右滑动，如图所示。我该怎么办？请注意第10行。最佳答案如果您只想为当前 View (正在编辑的当前文件)激活自动换行，只需vie
text - Sublime Text 字体目录
是否有可能更改 sublime text 中的默认字体目录？我只想使用可移植 sublime 文本存储在我的 pendrive 上的字体，这样我就不必在我使用可移植 sublime 文本的每台机器上安
"text"框旁边的Android "Text Field"
我是 Android 开发的新手，我有一个愚蠢的问题。如何将“文本字段”框放在一行中的文本旁边。例子: Please Enter the number: [ ] 关于 "t
c# - 用打印引号替换直引号 : "My text" to „My text“
我想自动将“我的文本”更改为“我的文本”，因为这是用德语写的正确方式。引号可以在文本中的任何位置。有没有一种简单的方法可以实现这一点？解决方案应该检查第一个字符，最后一个字符，比如“this”，或
silverlight - 使用 XAML 和文本 Text ="Some text {Some binding} some more text}"进行内联绑定(bind)的最佳实践
我想知道是否有特殊的语法来绑定(bind)与现有文本连接的文本。像这样。显然，这行不通。什么是最佳实践？使用 SL4。最佳答案使用StringFormat在 Binding 上。 WPF
javascript - console.log ('true text' || 很明显吗？真的？ 'text' : 'text1' ); logs 'text' ?
我认为它应该打印“真实文本”，因为它相当于 true console.log('true text' || true ? 'text' : 'text1'); 但是，输出是“文本”；抱歉，如果是愚蠢的
javascript - break text with css (text == white space == text) float 文本，文本中断
有没有办法通过 css 打破文本，以便中间有一个“空白”？目前我正在通过手工打破文本来解决这个问题 -但这是愚蠢的。我知道有一个函数可以让文本在另一个 div 中结束和开始，但 IE 不支持它。文本
text - Tcl/Tk : highlight some line in text widget or change the color for specific line text
我想为我的Tcl/Tk工具实现一个效果:在text控件中，根据具体情况，希望高亮一些线条的背景色，其他线条正常透明.有可能吗？我尝试了一些选项，例如:-highlightbackground 、-i
python - 当 'text' 可能包含更多 {{ text }} block 时，如何用 re.sub() 替换表达式 {{ text }} ？
我正在尝试解析原始维基百科文章内容，例如the article on Sweden ，使用re.sub()。但是，我在尝试替换 {{some text}} block 时遇到了问题，因为它们可以包含更
c# - 单声道 GTK# : Trying to remove text in ComboBox and then prepend new text to the ComboBox but some of the old text remains
我试图先删除 ComboBox 中的所有内容。然后在其前面添加文本，但保留了一些旧文本。有没有办法重置或清除 ComboBox？或者我怎样才能最好地实现这一目标？ public void GetBad
python - spaCy (v3.0) `nlp.make_doc(text)` 和 `nlp(text)` 之间的区别？为什么训练时要用 `nlp.make_doc(text)`？
我知道我们应该创建 Example对象并将其传递给 nlp.update() 方法。根据 docs 中的示例, 我们有 for raw_text, entity_offsets in train_da

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - Hadoop Map Reduce - 将 Iterable 值写入上下文时，reduce 中的嵌套循环忽略文本结果