java - hadoop倒排索引计数-6ren

java - hadoop倒排索引计数

转载作者：可可西里更新时间：2023-11-01 15:37:09

26

4

我有两个文件作为输入:

fileA.txt:

learn hadoop
learn java

文件B.txt:

hadoop java
eclipse eclipse

期望的输出:

learn   fileA.txt:2

hadoop  fileA.txt:1 , fileB.txt:1

java    fileA.txt:1 , fileB.txt:1

eclipse fileB.txt:2

我的归约方法:

public void reduce(Text key, Iterator<Text> values,
                OutputCollector<Text, Text> output, Reporter reporter)
                throws IOException {

            Set<Text> outputValues = new HashSet<Text>();
            while (values.hasNext()) {
                Text value = new Text(values.next());
                // delete duplicates
                outputValues.add(value);
            }
            boolean isfirst = true;
            StringBuilder toReturn = new StringBuilder();
            Iterator<Text> outputIter = outputValues.iterator();
            while (outputIter.hasNext()) {
                if (!isfirst) {
                    toReturn.append("/");
                }
                isfirst = false;
                toReturn.append(outputIter.next().toString());
            }
            output.collect(key, new Text(toReturn.toString()));
        }

我需要计数器的帮助(按文件计算字数)

我成功打印了:

learn   fileA.txt

hadoop  fileA.txt / fileB.txt

java    fileA.txt / fileB.txt

eclipse fileB.txt

但无法打印每个文件的计数

任何帮助将不胜感激

最佳答案

据我了解，这应该打印出您想要的内容:

@Override
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
    Map<String, Integer> fileToCnt = new HashMap<String, Integer>();
    while(values.hasNext()) {
        String file = values.next().toString();
        Integer current = fileToCnt.get(file);
        if (current == null) {
            current = 0;
        }
        fileToCnt.put(file, current + 1);
    }
    boolean isfirst = true;
    StringBuilder toReturn = new StringBuilder();
    for (Map.Entry<String, Integer> entry : fileToCnt.entrySet()) {
        if (!isfirst) {
            toReturn.append(", ");
        }
        isfirst = false;
        toReturn.append(entry.getKey()).append(":").append(entry.getValue());
    }
    output.collect(key, new Text(toReturn.toString()));
}

关于java - hadoop倒排索引计数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23411464/

26

4

0

文章推荐： java - hadoop异常实例化OutputKey

文章推荐： java - 测试 Hadoop 是否正常工作

文章推荐： apache - Mapreduce 和apache 的hama 之间的主要区别是什么？

android - 如何在黑色条码android上扫描白色(倒 jar )？
我想在 android 中扫描黑底白字条码。我使用过 zxing，它允许我只扫描白底黑字。我如何扫描和倒置条形码或使用哪个库？感谢您的帮助。最佳答案如果您仍在引用 journeyapps 嵌入式
c++ - 倒 X 轴 OpenGL
所以我在 youtube 上观看了一些介绍性类(class)以学习 OpenGL 的基础知识并学习了诸如制作三角形和简单相机类等内容。我一直想尝试制作体素引擎，这显然是第一个我想做的是一个我最终可以复
html - 倒 Angular 边框不同容器颜色 CSS
这个问题在这里已经有了答案: Div with cut out edges, border and transparent background (6 个答案) 关闭 8 年前。
html - 倒 Angular/圆 Angular 图像 HTML
我有一张图片，我正在查看用 HTML 创建的小型网站的基本定制。我知道您可以对图像进行倒 Angular 处理，如 this question here 中所示，这给出了 45 度切割。我希望每个
iphone - iOS 中的自定义形状(倒 T)边框 Uiview
我必须在 iOS 上创建一个自定义形状(倒 T)边框的 Uiview。我附上下面的截图。我进行了很多研究，找到了一种使用 here 中的 UIBezierPath 的方法. 但我不知道如何将我的 Vi

首页

博学

6Ren·AI

商城

java - hadoop倒排索引计数