performance - 为什么这个 Monte Carlo Haskell 程序这么慢？-6ren

performance - 为什么这个 Monte Carlo Haskell 程序这么慢？

转载作者：行者123 更新时间：2023-12-04 15:04:21

更新:

我的编译命令是ghc -O2 Montecarlo.hs。我的随机版本是random-1.1，ghc版本是8.6.4，我的系统是macOS Big Sur 11.1(Intel芯片)。我用来测试速度的命令是time ./Montecarlo 10000000，它返回的结果是real 0m17.463s, user 0m17.176s, sys 0m0.162s。

以下是使用蒙特卡洛计算pi的Haskell程序。但是，当输入为1000万时，程序运行了20秒。同样逻辑编写的C程序只用了0.206秒。为什么会这样，我怎样才能加快速度？谢谢大家。

这是 Haskell 版本:

import System.Random
import Data.List
import System.Environment

montecarloCircle :: Int -> Double
montecarloCircle x
   = 4*fromIntegral
        (foldl' (\x y -> if y <= 1 then x+1 else x) 0
           $ zipWith (\x y -> (x**2 + y**2))
                     (take x $ randomRs (-1.0,1) (mkStdGen 1) :: [Double])
                     (take x $ randomRs (-1.0,1) (mkStdGen 2) :: [Double]) )
        / fromIntegral x

main = do
    num <- getArgs
    let n = read (num !! 0)
    print $ montecarloCircle n

这是 C 版本:

#include <stdio.h>
#include <math.h>
#include <time.h>
#include <stdlib.h>

#define N       10000000

#define int_t   long // the type of N and M

// Rand from 0.0 to 1.0
double rand01()
{
    return rand()*1.0/RAND_MAX;
}

int main()
{
        srand((unsigned)time(NULL));

        double x,y;
        int_t M = 0;

        for (int_t i = 0;i < N;i++)
        {
                x = rand01();
                y = rand01();
                if (x*x+y*y<1) M++;
        }
        double pi = (double)4*M/N;
        printf("%lf\n", pi);
}

最佳答案

在某种程度上，在数值应用程序中，Haskell 代码往往比用经典命令式语言(如 C/C++/Fortran)编写的等效代码慢 2 到 5 倍。然而，Haskell 代码慢了 100 倍是非常意外的!

完整性检查:重现结果

使用带有 Intel Celeron N2840 64 位 cpu、运行 Linux kernel 5.3、GLIBC 2.28、GCC 8.3、GHC 8.2.2、GHC random-1.1 的有点旧的笔记本，我们得到了计时:

C code,        gcc -O3:    950 msecHaskell code,  ghc -O3:    104 sec

So indeed, with these configurations, the Haskell code runs about 100 times slower than the C code.

Why is that ?

A first remark is that the π computing arithmetics atop random number generation looks quite simple, hence the runtimes are probably dominated by the generation of 2*10 millions pseudo-random numbers. Furthermore, we have every reason to expect that Haskell and C are using completely unrelated random number generation algorithms.

So, rather than comparing Haskell with C, we could be comparing the random number generation algorithms these 2 languages happen to be using, as well as their respective implementations.

In C, the language standard does not specify which algorithm function srand() is supposed to use, but that tends to be old, simple and fast algorithms.

On the other hand, in Haskell, we have traditionally seen a lot of complaints about the poor efficiency of StdGen, like here back in 2006:

https://gitlab.haskell.org/ghc/ghc/-/issues/427

where one of the leading Haskell luminaries mentions that StdGen could possibly be made 30 times faster.

Fortunately, help is already on the way. This recent blog post explains how the new Haskell random-1.2 package is solving the problem of StdGen lack of speed, using a completely different algorithm known as splitmix.

The field of pseudo-random number generation (PRNG) is a rather active one. Algorithms routinely get obsoleted by newer and better ones. For the sake of perspective, here is a relatively recent (2018) review paper on the subject.

Moving to more recent, better Haskell components:

Using another, slightly more powerful machine with an Intel Core i5-4440 3GHz 64 bits cpu, running Linux kernel 5.10, GLIBC 2.32, GCC 10.2, and critically Haskell package random-1.2:

C code,        gcc -O3:    164 msecHaskell code,  ghc -O3:    985 msec

So Haskell is now “just” 6 times slower instead of 100 times.

And we still have to address the injustice of Haskell having to use x**2+y**2 versus C getting x*x+y*y, a detail which did not really matter before with random-1.1. This gives us 379 msec ! So we are back into the usual 2X-5X ballpark for Haskell to C speed comparisons.

Note that if we ask the Haskell executable to run with statistics on, we get the following output:

$ time q66441802.x +RTS -s -RTS 100000003.1415616     925,771,512 bytes allocated in the heap     ...  Alloc rate    2,488,684,937 bytes per MUT second  Productivity  99.3% of total user, 99.3% of total elapsedreal   0m0,379suser   0m0,373ssys    0m0,004s

so Haskell is found to allocate close to one gigabyte of memory along the way, which helps to understand the speed difference.

A code fix:

We note that the C code uses a single random serie while the Haskell code is using two, with two calls to mkStdGen with numbers 1 and 2 as seeds. This is not only unfair but also incorrect. Applied mathematicians who design PRNG algorithms take great care to ensure any single serie has the right statistical properties, but give essentially no guarantees about possible correlations between different series. It is not unheard of to even use the seed as an offset into a single global sequence.

This is fixed in the following code, which does not alter performance significantly:

computeNorms :: [Double] -> [Double]
computeNorms []  = []
computeNorms [x] = []
computeNorms (x:y:xs2) = (x*x + y*y) : (computeNorms xs2)

monteCarloCircle :: Int -> Double
monteCarloCircle nn =
    let
         randomSeed   =  1
         gen0         =  mkStdGen randomSeed
                      -- use a single random serie:
         uxys         =  (randomRs  (-1.0, 1.0)  gen0) :: [Double]
         norms        =  take nn (computeNorms uxys)
         insiderCount =  length  $  filter (<= 1.0) norms
    in
         (4.0::Double) * ((fromIntegral insiderCount) / (fromIntegral nn))

边注:

这个 recent SO question 中提到了新的 Haskell random-1.2 包，尽管是在新的 monadic 接口(interface)的上下文中。

结语:

假设练习的目标是衡量 C 和 Haskell 语言的相对运行速度，基本上有两种可能性:

一个是完全避免使用 PRNG，因为使用的算法不同。

另一种是通过手动为两种语言提供一种相同的算法来控制随机数的生成。例如，可以使用公开可用的 MRG32k3a由 Pierre L'Ecuyer 设计的算法。 Candidate Haskell MRG32k3a implementation here (截至 2021 年 3 月)。留作读者练习。

关于performance - 为什么这个 Monte Carlo Haskell 程序这么慢？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66441802/

文章推荐： java - JTextArea:居中对齐

文章推荐： scala - 在 Scala 3 中提取元组的子集

文章推荐： regex - Matlab:正则表达式:除了:函数末尾需要点 '.'

machine-learning - 首次访问与每次访问 Monte Carlo
我最近一直在研究强化学习。为此，我一直在阅读萨顿的名著，但还有一些我还没有完全理解的地方。对于蒙特卡洛学习，我们可以在首次访问和每次访问算法之间进行选择，并且可以证明两者都渐进地收敛到正确的解决方案
c++ - 加速 Monte Carlo Pi
我编写了一个 C++ 程序，用于通过“将随机点放入四分之一圆中并计算它们等”来计算圆周率。现在我的程序在我看来有点慢，我想过一些改进来加快它的速度(源代码在下面)。我的第一个想法是使用 OpenMP
C#、日期时间格式、mont 名称、cultureinfo
我目前正在开发一个小程序，我想根据 cultureinfo 清楚地显示日期和时间，如下所示:2017 年 3 月 3 日星期日上午 7:46:57 但是，我希望可以选择任何国家，并根据他们写日期的方式
c++ - Monte Carlo Sims - 请检查我的算法
基本上，这个问题模拟了以下内容: 有一个装有 50 个绿球和 50 个红球的瓮。我可以从 jar 里取出球，无需更换，规则如下:每取出一个红球，我将损失一美元，每取出一个绿色球，我将获得一美元。我
performance - 为什么这个 Monte Carlo Haskell 程序这么慢？
更新: 我的编译命令是ghc -O2 Montecarlo.hs。我的随机版本是random-1.1，ghc版本是8.6.4，我的系统是macOS Big Sur 11.1(Intel芯片)。我用来测
r - 基于置换法的方差分析中 F 统计量的 Monte Carlo 估计
我正在尝试使用组标识符 g 在 (y1,...,yN) 上对 ANOVA 进行排列检验.我应该使用 (1)/(g-1) (muhatj - muhat)^2 的总和作为测试统计量，而 muhatj 是
R代码: Is there a way to make this Monte Carlo simulation quicker?
想象一下我递给你一个上面印有“-1”的乒乓球。然后我告诉你从标有“第一袋”的袋子中取出另一个乒乓球。这个袋子里有 30,000 个球，有的标有“-1”，有的标有“0”，有的标有“+1”。无论您抽到哪个
python - mcint 模块 Python-Monte Carlo 集成
我正在尝试运行一个代码，通过使用蒙特卡罗积分对一维高斯分布方程进行积分来输出高斯分布。我正在尝试使用 mcint 模块。我定义了 mcint 模块中使用的高斯方程和采样器函数。我不确定 mcint 函
r - 如何在 R 中为 Monte Carlo 创建更有效的模拟循环
此练习的目的是创建营养摄入值的人口分布。之前的数据中有重复的度量，这些已被删除，因此每一行都是数据框中的唯一人。我有这个代码，当使用少量我的数据框行进行测试时，它工作得很好。对于所有 7135 行，
c# - 优化 Poker-Monte-Carlo-Simulation 的手牌评估算法
我已经为 Hold'em Poker 编写了一个平衡器作为一个业余项目。它工作正常，但还有一件事我不满意:在整个模拟手的过程中，评估手的过程大约占用了35%的时间。与迭代和克隆大型数组等其他必须完成的
python - 在 python 中加速 monte carlo 代码
考虑从 [0,T) 开始按递增顺序给出的点 Y。我们要将这些点视为位于圆周 T 的圆上。现在考虑来自 [0,T) 的点 X 也位于圆周 T 的圆上。我们说 X 和 Y 之间的距离是 X 中的每个点与
r - 使用 R 来尊重 LatinHypercube/Monte Carlo 试验的相关性
我目前正在使用 python 和 RPY 来使用 R 中的功能。我如何使用 R 库生成蒙特卡罗样本，以尊重 2 个变量之间的相关性.. 例如如果变量 A 和 B 具有 85% (0.85) 的相关
python - Python 中的 Monte Carlo 和 Metropolis 算法非常慢
我正在尝试用 Python 实现一个简单的蒙特卡洛(我对此还很陌生)。来自 C 我可能走的是最错误的道路，因为我的代码对于我所要求的来说太慢了:对于 60 个 3d 粒子和周期性边界条件(PBC)，我
r - R 中的 Monte Carlo 模拟、Bootstrap 和回归
我已经使用 SAS 很长时间了，现在我想用 R 翻译我的代码。我需要帮助来执行以下操作: 生成多个引导样本对每个样本运行线性回归模型通过复制样本将参数存储在新数据集中为了更清晰，我编辑了这段代码
基于 Monte Carlo 的 Pi 计算的 Python 高效矢量化
为了近似 Pi 的值，请考虑使用随机值填充数组并测试是否包含单位圆的随机方法， import random as rd import numpy as np def r(_): return rd.r
algorithm - 固定 x 的 Monte Carlo pi 方法
在我发现的所有计算 pi 的蒙特卡洛示例代码中，x 和 y 都是在 0 和 1 之间随机生成的。例如，示例代码如下所示 Ran rdm(time(NULL)); double x, y;
R:通过 Permute 或 Sample 函数的 Monte Carlo 程序以生成零分布
从这个数据集中，我有我的聚类分析分配的所有患者样本(总共 69 行)，并且聚类被标记为第 3 列“Cluster.assigned”，总共 8 个聚类，每个聚类大小不等。其他列包含变量，其中我想测试数
machine-learning - 如何使用 PyTorch 计算 Monte Carlo Dropout 神经网络的不确定性？
我正在尝试在 Pytorch 上使用 Mc Dropout 实现贝叶斯 CNN，主要思想是通过在测试时应用 dropout 并运行多次前向传递，您可以获得来自各种不同模型的预测。我需要获得不确定性，有
r - 如何在 R 中使用 Monte Carlo 进行 ARIMA 模拟函数
这是我想用 R 做的算法: 模拟来自 ARIMA 的 10 个时间序列数据集模型通arima.sim()功能将系列拆分为可能的子系列 2s , 3s , 4s , 5s , 6s , 7s , 8s
python - 使用 Monte Carlo 与 scipy.integrate.nquad 的不同积分结果
下面的 MWE 显示了两种对相同 2D 核密度估计进行积分的方法，这些估计是为 this data 获得的。使用 stats.gaussian_kde()功能。对所有 (x, y) 执行集成低于阈值

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城