java - Hadoop MapReduce - Euler 的 Totient/Sum of Totient(和其他数学运算)-6ren

java - Hadoop MapReduce - Euler 的 Totient/Sum of Totient(和其他数学运算)

转载作者：可可西里更新时间：2023-11-01 15:09:33

作为我研究的一部分，我正在使用不同的并行计算语言实现 Totient 求和(Euler 的 Totient)，老实说，我在 MapReduce 方面相当吃力。主要目标是对运行时、效率等进行基准测试......

我的代码现在正在运行，我得到了正确的输出，但速度很慢，我想知道为什么。

是因为我的实现还是因为 Hadoop MadReduce 不是为此目的而设计的。我还实现了一个组合器，因为根据我的阅读，它应该优化代码，但事实并非如此。抱歉，如果这个问题看起来很愚蠢，但我在互联网上没有找到任何东西，而且我已经厌倦了尝试一切都没有任何结果。

我的输入文件是1到15000之间的值

1 2 3 4 5 6 ... 14998 14999 15000

我在 32 个节点的集群上工作，我的目标是让每个节点计算我的范围(组合器)的一部分，然后在 reducer 中对组合器的所有“子和”求和。

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class NewTotient {

  public static long hcf(long x, long y)
  {
    long t;

    while (y != 0) {
      t = x % y;
      x = y;
      y = t;
    }
    return x;
  }

  public static boolean relprime(long x, long y)
  {
    return hcf(x, y) == 1;
  }

  public static long euler(long n)
  {
    long length, i;

    length = 0;
    for (i = 1; i < n; i++)
      if (relprime(n, i))
        length++;
    return length;
  }

  public static class TotientMapper extends Mapper<LongWritable, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        for (String val : value.toString().split(" ")) {
            context.write(new Text(), new IntWritable(Integer.valueOf(val)));
        }
    }
  }

  public static class TotientCombiner extends Reducer<Text,IntWritable,Text,IntWritable> {
    //private IntWritable result = new IntWritable();

    protected void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {
          int sum = 0;
          for (IntWritable val : values) {
              sum += NewTotient.euler(val.get());
          }
      }
  }

  public static class TotientReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
    //private IntWritable result = new IntWritable();

    protected void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {
          int sum = 1;
          for (IntWritable val : values) {
              sum += val.get();
          }
          context.write(null, new IntWritable(sum));
      }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    System.out.println("\n\n__________________________________________________________\n"+"Starting Job\n"+"__________________________________________________________\n\n");
    final long startTime = System.currentTimeMillis();

    Job job = Job.getInstance(conf, "Sum of Totient");
    job.setJarByClass(NewTotient.class);
    job.setMapperClass(TotientMapper.class);
    job.setCombinerClass(TotientCombiner.class);
    job.setReducerClass(TotientReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    //job.setOutputKeyClass(Text.class);
    //job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.waitForCompletion(true);
    final double duration = (System.currentTimeMillis() - startTime)/1000.0;
    System.out.println("\n\n__________________________________________________________\n"+"Job Finished in " + duration + " seconds\n"+"__________________________________________________________\n\n");
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

如果这对我有帮助的话，这是我从 0 到 10 的数据集的输出(所以基本上我只是计算前 10 个 Totient 的总和:

__________________________________________________________
Starting Job
__________________________________________________________


2018-04-02 06:09:27,583 INFO client.RMProxy: Connecting to ResourceManager at bwlf32/137.195.143.132:33312
2018-04-02 06:09:28,377 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2018-04-02 06:09:28,423 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/jo20/.staging/job_1522471222360_0016
2018-04-02 06:09:28,775 INFO input.FileInputFormat: Total input files to process : 1
2018-04-02 06:09:29,029 INFO mapreduce.JobSubmitter: number of splits:1
2018-04-02 06:09:29,101 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-04-02 06:09:29,288 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1522471222360_0016
2018-04-02 06:09:29,290 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-04-02 06:09:29,538 INFO conf.Configuration: resource-types.xml not found
2018-04-02 06:09:29,539 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-04-02 06:09:29,628 INFO impl.YarnClientImpl: Submitted application application_1522471222360_0016
2018-04-02 06:09:29,687 INFO mapreduce.Job: The url to track the job: http://bwlf32:33314/proxy/application_1522471222360_0016/
2018-04-02 06:09:29,688 INFO mapreduce.Job: Running job: job_1522471222360_0016
2018-04-02 06:09:37,849 INFO mapreduce.Job: Job job_1522471222360_0016 running in uber mode : false
2018-04-02 06:09:37,852 INFO mapreduce.Job:  map 0% reduce 0%
2018-04-02 06:09:44,960 INFO mapreduce.Job:  map 100% reduce 0%
2018-04-02 06:09:52,008 INFO mapreduce.Job:  map 100% reduce 100%
2018-04-02 06:09:52,022 INFO mapreduce.Job: Job job_1522471222360_0016 completed successfully
2018-04-02 06:09:52,178 INFO mapreduce.Job: Counters: 53
    File System Counters
        FILE: Number of bytes read=6
        FILE: Number of bytes written=414497
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=123
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=8
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=1
        Launched reduce tasks=1
        Rack-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=9126
        Total time spent by all reduces in occupied slots (ms)=9688
        Total time spent by all map tasks (ms)=4563
        Total time spent by all reduce tasks (ms)=4844
        Total vcore-milliseconds taken by all map tasks=4563
        Total vcore-milliseconds taken by all reduce tasks=4844
        Total megabyte-milliseconds taken by all map tasks=1168128
        Total megabyte-milliseconds taken by all reduce tasks=1240064
    Map-Reduce Framework
        Map input records=1
        Map output records=10
        Map output bytes=50
        Map output materialized bytes=6
        Input split bytes=102
        Combine input records=10
        Combine output records=0
        Reduce input groups=0
        Reduce shuffle bytes=6
        Reduce input records=0
        Reduce output records=0
        Spilled Records=0
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=157
        CPU time spent (ms)=2220
        Physical memory (bytes) snapshot=507772928
        Virtual memory (bytes) snapshot=3889602560
        Total committed heap usage (bytes)=347078656
        Peak Map Physical memory (bytes)=306073600
        Peak Map Virtual memory (bytes)=1945808896
        Peak Reduce Physical memory (bytes)=201699328
        Peak Reduce Virtual memory (bytes)=1943793664
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=21
    File Output Format Counters
        Bytes Written=0


__________________________________________________________
Job Finished in 26.225 seconds
__________________________________________________________


2018-04-02 06:09:52,182 INFO mapreduce.Job: Running job: job_1522471222360_0016
2018-04-02 06:09:52,188 INFO mapreduce.Job: Job job_1522471222360_0016 running in uber mode : false
2018-04-02 06:09:52,188 INFO mapreduce.Job:  map 100% reduce 100%
2018-04-02 06:09:52,193 INFO mapreduce.Job: Job job_1522471222360_0016 completed successfully
2018-04-02 06:09:52,201 INFO mapreduce.Job: Counters: 53
    File System Counters
        FILE: Number of bytes read=6
        FILE: Number of bytes written=414497
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=123
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=8
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=1
        Launched reduce tasks=1
        Rack-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=9126
        Total time spent by all reduces in occupied slots (ms)=9688
        Total time spent by all map tasks (ms)=4563
        Total time spent by all reduce tasks (ms)=4844
        Total vcore-milliseconds taken by all map tasks=4563
        Total vcore-milliseconds taken by all reduce tasks=4844
        Total megabyte-milliseconds taken by all map tasks=1168128
        Total megabyte-milliseconds taken by all reduce tasks=1240064
    Map-Reduce Framework
        Map input records=1
        Map output records=10
        Map output bytes=50
        Map output materialized bytes=6
        Input split bytes=102
        Combine input records=10
        Combine output records=0
        Reduce input groups=0
        Reduce shuffle bytes=6
        Reduce input records=0
        Reduce output records=0
        Spilled Records=0
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=157
        CPU time spent (ms)=2220
        Physical memory (bytes) snapshot=507772928
        Virtual memory (bytes) snapshot=3889602560
        Total committed heap usage (bytes)=347078656
        Peak Map Physical memory (bytes)=306073600
        Peak Map Virtual memory (bytes)=1945808896
        Peak Reduce Physical memory (bytes)=201699328
        Peak Reduce Virtual memory (bytes)=1943793664
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=21
    File Output Format Counters
        Bytes Written=0

在 Java 中使用我的顺序代码速度更快:

real    0m0.512s
user    0m0.279s
sys     0m0.142s

明确一点，我必须使用这种计算方式，因为它足够慢，可以在不同系统之间进行有趣的比较，即使我知道，我也无法使用更智能的计算方式提高我的系统速度有计算所有素因子及其倍数的想法，并从 n 中减去此计数以获得 totient 函数值(素因子和素因子的倍数不会使 gcd 为 1)。

最佳答案

此处您在单行中提供来自文件的输入。映射器中使用的键是新行，因此由于只有一行，它将由单个映射任务处理，因此它不会并行处理输入。您可以做的一件事是在新行而不是空格中提供每个输入数字，并相应地更改映射器。组合器在这里也没有多大意义，因为您没有在 map 输出中使用不同的键

关于java - Hadoop MapReduce - Euler 的 Totient/Sum of Totient(和其他数学运算)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49605690/

文章推荐： C++ 多态性和默认参数

文章推荐： c++ - C 与 C++ 中具有静态存储持续时间的对象的初始化

boolean 运算
为什么 (defun boolimplies (a b) (or (not a) b)) if called as(boolimplies 'a 'b) 返回 B? 即使我不使用任何 boolean
python - 跨多个列表的逻辑 AND 运算
这个问题已经有答案了: Are there builtin functions for elementwise boolean operators over boolean lists? (5 个回答
javascript - 对分成两个数字的字符串进行百分比 (%) 运算
我正在寻求帮助以使以下功能看起来更清晰。我觉得我可以通过使用更少的代码行来实现同样的目标。标题看起来一定很困惑，所以让我详细说明一下。我创建了一个函数，它接受用户输入(即 72+5)，将字符串拆分为
C++运算符重载无法输出+运算
我正在学习 C++ 并尝试为矩阵编写一个 C++ 类，我在其中将矩阵存储为一维 C 数组。为此，我定义了一个 element成员函数根据矩阵元素在数组中的位置访问矩阵元素。然后我重载了 class
C++运算符重载无法输出+运算
我正在学习 C++ 并尝试为矩阵编写一个 C++ 类，我在其中将矩阵存储为一维 C 数组。为此，我定义了一个 element成员函数根据矩阵元素在数组中的位置访问矩阵元素。然后我重载了 class
java - 使用 AND 运算
伙计们，以下内容不起作用函数返回 true，变量返回 false，但它不会进入 when 子句。我尝试像这样放大括号但是当我将变量的值设置为 true 并将上面的代码更改为它进入w
c - 不同位长度的 AND 运算
关闭。此题需要details or clarity 。目前不接受答案。想要改进这个问题吗？通过 editing this post 添加详细信息并澄清问题. 已关闭 9 年前。 Improve th
c - if 运算中的 OR 运算
我是原生 C 语言的新手，但我没有看到错误。我尝试在这种情况下使用 if 操作: #define PAGE_A 0 #define PAGE_B 1 int pageID = 0; if (page
javascript - 两个String不能相加(+=运算)
我正在从事一个项目，让用户鼠标滚轮移动并知道它向上或向下滚动。在我的代码中，我可以上下移动。但我想将 Action 保存到一个字符串中。例如，如果用户向上向上向下滚动'mhmh' 显示返回“UUD”但
MySQL SUM() 运算
我有一个 MySQL 表 payment我在其中存储客户的所有付款相关数据。表字段为:fileNo , clientName , billNo , billAmount , status 。我想构建一
MySql OR 和 AND 运算
我的表架构如下: +------+-------+-------+
C++ - boolean 运算
我有这个(顺便说一句，我刚刚开始学习): #include #include using namespace std; int main() { string mystr; cout << "We
linux - 变量的 IF 运算
我正在用 bash 构建一个用于 Linux (SLES 11SP3) 的脚本。我想通过使用以下语法查找它的 pid 来检查某个进程是否存在: pid="$(ps -ef | grep -v grep
mysql - 如何对单个列执行 AND 运算？
我有一个包含两列的表格； CREATE TABLE IF NOT EXISTS `QUESTION_CATEGORY_RELATION` ( `question_id` int(16) NOT N
python - bool 运算
我对 Python 如何计算 bool 语句感到困惑。例如 False and 2 or 3 返回 3 这是如何评估的？我认为 Python 首先会查看“False and 2”，甚至不查看“or
integer - 带整数的 boolean 运算
这个问题在这里已经有了答案: 12 年前关闭。这可能是非常基本的......但我似乎不明白: 如何 (2 & 1) = 0 (3 & 1) = 1 (4 & 1) = 0 等等.. 上面的这种模式似
Haskell:非严格的 bool 运算
无论如何在Haskell中定义如下函数？ or True True = True or True undefined = True or True False
runtime - 将数学运算添加到标准 TCL 运算
如您所知，TCL 有一些数学函数，例如罪 , 因 , 和假设在中调用的expr 带有的命令() 大括号如下: puts [expr sin(1.57)] 现在如何使用 TCL 添加功能 li
java - Java 中列表的 AND/OR 运算
让我们考虑两个数组列表。 ArrayList list1 = new ArrayList(); list1.add(1); list1.add(2); list1.add(3); ArrayList
php - 使用AND和OR的Elasticsearch NOT bool 运算
我想包含和排除使用AND和OR的专业知识，包括与AND和OR操作正常工作。但是，当将排除专家与AND和OR一起使用时，返回与3相同的结果计数。我使用的是1.4版 Elasticsearch 。帮助我解

可可西里

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - Hadoop MapReduce - Euler 的 Totient/Sum of Totient(和其他数学运算)