performance - 为什么这个 Haskell 程序表现这么差？-6ren

performance - 为什么这个 Haskell 程序表现这么差？

转载作者：行者123 更新时间：2023-12-03 04:52:11

我是一个 Haskell 新手，我对这个程序的性能如此糟糕感到困惑。我尝试在不同的地方强制使用严格的变量，但似乎没有什么区别。

这是我的代码(该程序的目的是产生从标准输入找到的输入字节的频率):

{-# LANGUAGE BangPatterns #-}

import Control.Concurrent (forkIO, killThread)
import Control.Concurrent.MVar
import qualified Data.IntMap as IntMap
import Data.IntMap.Strict (IntMap)
import Control.Monad.Fix
import Control.Monad (when)
import qualified Data.Char as Char
import qualified System.IO as IO
import System.IO (hSetBinaryMode, hFlush)
import Data.List as List
import Text.PrettyPrint.Boxes as Boxes
import Text.Printf (printf)
import Data.Function

data BFreq = BFreq Integer (IntMap Integer)

main :: IO ()
main = do
  putStrLn "analyze data from stdin"
  hSetBinaryMode IO.stdin True
  mv <- newEmptyMVar
  tid <- forkIO $ statusUpdater mv
  bf <- run mv
  killThread tid
  displayResults bf

resultTable :: [[String]] -> Box
resultTable rows =
  Boxes.hsep 4 Boxes.left boxed_cols
  where
    cols       = transpose rows
    boxed_cols = map (Boxes.vcat Boxes.left . map text) cols

displayResults :: BFreq -> IO ()
displayResults (BFreq n counts) = do
  putStrLn $ "read " ++ (show n) ++ " bytes"
  when (n > 0) (displayFreqs n counts)

displayFreqs :: Integer -> IntMap Integer -> IO ()
displayFreqs n counts =
  do
    putStrLn "frequencies:"
    Boxes.printBox $ resultTable rows
  where
    cmp x y       = compare (snd y) (snd x)
    sorted_counts = List.sortBy cmp $ IntMap.assocs counts

    intdiv :: Integer -> Integer -> Float
    intdiv a b = (fromIntegral a) / (fromIntegral b)

    percent y    = printf "%.2f" (100*intdiv y n)
    show_byte x  = (show $ Char.chr x) ++ " (" ++ (show x) ++ "):"
    show_count y = (percent y) ++ "% (" ++ (show y) ++ ")"

    rows = map (\(x,y) -> [show_byte x, show_count y]) sorted_counts


run :: MVar Integer -> IO BFreq
run mv = 
  fn mv 0 IntMap.empty 
  where
    fn mv !n !mp =
      do
        tryPutMVar mv n
        eof <- IO.isEOF
        if eof
          then return $ BFreq n mp
          else do
            b <- getChar
            fn mv (1+n) (new_map b)
      where
        k x       = Char.ord x
        old_val x = IntMap.findWithDefault 0 (k x) mp
        new_map x = IntMap.insert (k x) ((old_val x)+1) mp

statusUpdater :: MVar Integer -> IO ()
statusUpdater mv = 
  do
    takeMVar mv >>= print_progress
    statusUpdater mv
  where
    print_progress n = 
      do
        putStr $ "\rbytes: "
        when (gbs > 0) $ putStr $ (show gbs) ++ " GBs "
        when (mbs > 0) $ putStr $ (show mbs) ++ " MBs "
        when (kbs > 0) $ putStr $ (show kbs) ++ " KBs "
        when (gbs < 1 && mbs < 1 && kbs < 1) $ putStr $ (show bs) ++ " Bs "
        hFlush IO.stdout
      where
        (gbs, gbr)   = quotRem n 0x40000000
        (mbs, mbr)   = quotRem gbr 0x100000
        (kbs, bs)    = quotRem mbr 0x400

这是我运行它时发生的情况(注意:我使用 -O2 进行编译):

$> cabal build -v                                                                                             
creating dist/build                                                                                                                       
creating dist/build/autogen                                                                                                                 
Building bfreq-0.1.0.0...                                                                                                                   
Preprocessing executable 'bfreq' for bfreq-0.1.0.0...                                                                                       
Building executable bfreq...                                                                                                                  
creating dist/build/bfreq                                                                                                                     
creating dist/build/bfreq/bfreq-tmp                                                                                                           
/usr/bin/ghc --make -o dist/build/bfreq/bfreq -hide-all-packages -fbuilding-cabal-package -package-conf dist/package.conf.inplace -i -idist/build/bfreq/bfreq-tmp -i. -idist/build/autogen -Idist/build/autogen -Idist/build/bfreq/bfreq-tmp -optP-include -optPdist/build/autogen/cabal_macros.h -odir dist/build/bfreq/bfreq-tmp -hidir dist/build/bfreq/bfreq-tmp -stubdir dist/build/bfreq/bfreq-tmp -package-id base-4.5.0.0-40b99d05fae6a4eea95ea69e6e0c9702 -package-id boxes-0.1.3-e03668bca38fe3e879f9d695618ddef3 -package-id containers-0.5.3.1-80819105034e34d03d22b1c20d6fd868 -O -O2 -rtsopts -XHaskell98 ./bfreq.hs
[1 of 1] Compiling Main             ( bfreq.hs, dist/build/bfreq/bfreq-tmp/Main.o )
Linking dist/build/bfreq/bfreq ...
$> cat /dev/urandom | head -c 9999999 > test_data
$> cat ./test_data | ./dist/build/bfreq/bfreq +RTS -sstderr
analyze data from stdin
bytes: 9 MBs 521 KBs read 9999999 bytes
frequencies:
'\137' (137):    0.40% (39642)
'H' (72):        0.40% (39608)
<...>
'L' (76):        0.39% (38617)
'\246' (246):    0.39% (38609)
'I' (73):        0.38% (38462)
'q' (113):       0.38% (38437)
   9,857,106,520 bytes allocated in the heap
  14,492,245,840 bytes copied during GC
   3,406,696,360 bytes maximum residency (13 sample(s))
      14,691,672 bytes maximum slop
            6629 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     18348 colls,     0 par   10.90s   10.90s     0.0006s    0.0180s
  Gen  1        13 colls,     0 par   15.20s   19.65s     1.5119s    12.6403s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   14.45s  ( 14.79s elapsed)
  GC      time   26.10s  ( 30.56s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   40.55s  ( 45.35s elapsed)

  %GC     time      64.4%  (67.4% elapsed)

  Alloc rate    682,148,818 bytes per MUT second

  Productivity  35.6% of total user, 31.9% of total elapsed

所以，除非我误解了上面的调试输出，否则我的程序正在使用 6 GB？测试数据不到10MB，这是怎么回事？

关于如何在 Haskell 中解决此类问题的任何一般建议也很好。换句话说，对于这种以 I/O 为中心的事情，我应该避免使用 Haskell 吗？我应该使用管道库来做这种事情吗？

编辑:感谢您的帮助，正确导入严格版本的 IntMap 修复了内存问题。

我无法让分析(-fprof-auto)工作，因为我的包似乎都没有被编译用于分析。我通过为我的操作系统安装 ghc 分析包(ubuntu:ghc-prof)解决了分析基础库缺乏的问题，但根据 this我需要手动重新安装所有 haskell 库进行分析。我现在没有时间这样做，所以我只是将这个链接放在这里，以方便任何有类似问题的人。

最佳答案

如果您按照 the GHC guide chapter on profiling 使用 -fprof-auto 进行编译，您将看到 run.fn.new_map 和 run.fn 中发生大量分配。

有问题的代码:

new_map x = IntMap.insert (k x) ((old_val x)+1) mp

怀疑:((old_val x)+1) 正在创建一系列未评估的 thunk。建议的更改:

new_map x = let ov  = old_val x + 1 in
            ov `seq` IntMap.insert (k x) ov mp

瞧!分配、GC 和内存使用量均大幅下降。

编辑:您可能打算将合格的 Data.IntMap.Strict 导入为 IntMap，从而无需进行此更改。

关于performance - 为什么这个 Haskell 程序表现这么差？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20850260/

文章推荐： vba - 在打开任何表单之前运行 VBA

文章推荐： robocopy 保留较小的文件

文章推荐： vim - 如何在不同模式下改变vim光标形状

文章推荐： svn 不更新，不恢复，但文件在服务器上

haskell - Haskell 和类 Haskell 语言之间的类型声明语法差异
在 Haskell 中，类型声明使用双冒号，即 (::)，如 not::Bool -> Bool。但是在许多语法与 Haskell 类似的语言中，例如榆树、 Agda 、他们使用单个冒号(:)来声明
haskell - 在模板 haskell 中运行模板 haskell
insertST :: StateDecoder -> SomeState -> Update SomeState SomeThing insertST stDecoder st = ... Stat
haskell - 在 Haskell ("second order Haskell"中生成 Haskell 类型的工具？
如果这个问题有点含糊，请提前道歉。这是一些周末白日梦的结果。借助 Haskell 出色的类型系统，将数学(尤其是代数)结构表达为类型类是非常令人愉快的。我的意思是，看看 numeric-prelud
haskell - 如何仅使用 Haskell 无休止地运行 Haskell 程序？
我有需要每 5 分钟执行一次的小程序。目前，我有执行该任务的 shell 脚本，但我想通过 CLI 中的键为用户提供无需其他脚本即可运行它的能力。实现这一目标的最佳方法是什么？最佳答案我想你会
haskell - 需要以真实世界 Haskell 风格解决哪些 Haskell 主题？
RWH 面世已经有一段时间了(将近 3 年)。在在线跟踪这本书的渐进式写作之后，我渴望获得我的副本(我认为这是写书的最佳方式之一。)在所有相当学术性的论文中，作为一个 haskell 学生，读起来多么
haskell - 用 Haskell 编写 Haskell 解释器
一个经典的编程练习是用 Lisp/Scheme 编写一个 Lisp/Scheme 解释器。可以利用完整语言的力量来为该语言的子集生成解释器。 Haskell 有类似的练习吗？我想使用 Haskell
haskell - Haskell 中的仿函数定义及其在 Learn You a Haskell 中的解释令人困惑
以下摘自' Learn You a Haskell ' 表示 f 在函数中用作“值的类型”。这是什么意思？即“值的类型”是什么意思？ Int 是“值的类型”，对吗？但是 Maybe 不是“值的类型”
haskell - haskell 中有包含字符串和列表的类型吗？
现在我正在尝试创建一个基本函数，用于删除句子中的所有空格或逗号。 stringToIntList :: [Char] -> [Char] stringToIntList inpt = [ a | a
haskell - 案例中的模式匹配，Haskell
我是 Haskell 的新手，对模式匹配有疑问。这是代码的高度简化版本: data Value = MyBool Bool | MyInt Integer codeDuplicate1 :: Valu
haskell - Haskell 中的这个仿函数是什么意思？
如何解释这个表达式？ :t (+) (+3) (*100) 自和具有相同的优先级并且是左结合的。我认为这与 ((+) (+3)) (*100) 相同.但是，我不知道它的作用。在 Learn
haskell - Haskell 如何计算表达式
这怎么行 > (* 30) 4 120 但这不是 > * 30 40 error: parse error on input ‘*’ 最佳答案 (* 30) 是一个 section，它仍然将 * 视为
haskell - 删除满足谓词的第一个元素(Haskell)
我想创建一个函数，删除满足第二个参数中给定谓词的第一个元素。像这样: removeFirst "abab" ( 'b') = "abab" removeFirst [1,2,3,4] even =
haskell - Haskell 中的内存
Context : def fib(n): if n aand returns a memoized version of the same function. The trick is t
haskell - 惰性评估和严格评估 Haskell
我明白惰性求值是什么，它是如何工作的以及它有什么优势，但是你能解释一下 Haskell 中什么是严格求值吗？我似乎找不到太多关于它的信息，因为惰性评估是最著名的。他们各自的优势是什么。什么时候真正使
haskell - Haskell 中的反向函数行为
digits :: Int -> [Int] digits n = reverse (x) where x | n digits 1234 = [3,1,2,4]
haskell - Haskell 是否支持类型类的匿名实例？
我在 F# 中有以下代码(来自一本书) open System.Collections.Generic type Table = abstract Item : 'T -> 'U with ge
haskell - 使用需要多个输入的过滤器 - Haskell
我对 Haskell 比较陌生，过去几周一直在尝试学习它，但一直停留在过滤器和谓词上，我希望能得到帮助以帮助理解。我遇到了一个问题，我有一个元组列表。每个元组包含一个 (songName, song
haskell - 或采用两个值参数 haskell
我是 haskell 的初学者，我试图为埃拉托色尼筛法定义一个简单的函数，但它说错误: • Couldn't match expected type ‘Bool -> Bool’
haskell - Haskell 中的读取函数
我是 Haskell 语言的新手，我在使用 read 函数时遇到了一些问题。准确地说，我的理解是: read "8.2" + 3.8 应该返回 12.0，因为我们希望返回与第二个成员相同的类型。我真正
haskell - Haskell 声明中的感叹号是什么意思？
当我尝试使用真实项目来驱动它来学习 Haskell 时，我遇到了以下定义。我不明白每个参数前面的感叹号是什么意思，我的书上好像也没有提到。 data MidiMessage = MidiMessage

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

performance - 为什么这个 Haskell 程序表现这么差？