hadoop - 在较早的帖子中进行了澄清(处理输入文件中的前N行)-6ren

hadoop - 在较早的帖子中进行了澄清(处理输入文件中的前N行)

转载作者：行者123 更新时间：2023-12-02 21:49:45

我想创建一个仅处理输入文件中前k行的映射器。我碰到了这篇文章:

Hadoop-> Mapper->How can we read only Top N rows from each file from given input path?
它说覆盖如下的run方法:

@Override
public void run(Context context) throws IOException, InterruptedException {
  setup(context);

  int rows = 0;
  while (context.nextKeyValue()) {
    if (rows++ == 10) {
      break;
    }

    map(context.getCurrentKey(), context.getCurrentValue(), context);
  }

  cleanup(context);
}

因此，我尝试了该解决方案，但编译器无法找到我试图导入org.apache.hadoop.mapreduce.Mapper。*的“Context”和“setup()”，但不起作用
也有人可以解释map()函数中的参数吗？

样例代码

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> 
{
    @Override
    public void run(Context context) throws IOException, InterruptedException //for reading the first k lines only
    {
        setup(context);

        int k = 5;

        int rows = 0;
        while (context.nextKeyValue()) 
        {
            if (rows++ == k)        break;
                map(context.getCurrentKey(), context.getCurrentValue(), context);
        }

        cleanup(context);
    }

}

最佳答案

您没有扩展Mapper类。

应该是这样的，例如:

MyMapper extends Mapper<Object, Text, Text, IntWritable>

在这里查看示例:
http://wiki.apache.org/hadoop/WordCount
http://hadoop.apache.org/docs/stable/api/src-html/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html(这也覆盖了运行部分)

关于hadoop - 在较早的帖子中进行了澄清(处理输入文件中的前N行)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22088221/

文章推荐： java - 使用IntWritable的Hadoop Reduce输出始终在2处停止

文章推荐： Perl 和 LWP 未进行身份验证

文章推荐： hadoop - 将单个文件的几行传递给hadoop中相等数量的map

java - 如何在 servlet 准备好服务后在 tomcat 中获取通知？ (ServletContextListener 较早)
我想在 tomcat 加载 servlet 并准备好服务时在启动时收到通知。我想在此通知中对此 servlet 进行 http 调用。我已经尝试将 ServletContextListener 添加

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

hadoop - 在较早的帖子中进行了澄清(处理输入文件中的前N行)