performance - 如何优化可以完全严格的循环-6ren

performance - 如何优化可以完全严格的循环

转载作者：行者123 更新时间：2023-12-04 14:31:16

我正在尝试为 Project Euler Problem #145 编写一个蛮力解决方案，并且我无法让我的解决方案在不到 1 分 30 秒内运行。

(我知道有各种捷径，甚至是纸笔解决方案；出于这个问题的目的，我不考虑这些)。

在我迄今为止提出的最佳版本中，分析显示大部分时间都花在 foldDigits 上。 .这个函数根本不需要偷懒，在我看来应该优化为一个简单的循环。如您所见，我试图使程序的各个部分变得严格。

所以我的问题是:在不改变整体算法的情况下，有什么方法可以将该程序的执行时间降低到亚分钟标记？

(或者如果没有，有没有办法看到 foldDigits 的代码尽可能优化？)

-- ghc -O3 -threaded Euler-145.hs && Euler-145.exe +RTS -N4

{-# LANGUAGE BangPatterns #-}

import Control.Parallel.Strategies

foldDigits :: (a -> Int -> a) -> a -> Int -> a
foldDigits f !acc !n
    | n < 10    = i
    | otherwise = foldDigits f i d
  where (d, m) = n `quotRem` 10
        !i     = f acc m

reverseNumber :: Int -> Int
reverseNumber !n
    = foldDigits accumulate 0 n
  where accumulate !v !d = v * 10 + d

allDigitsOdd :: Int -> Bool
allDigitsOdd n
    = foldDigits andOdd True n
  where andOdd !a d = a && isOdd d
        isOdd !x    = x `rem` 2 /= 0

isReversible :: Int -> Bool
isReversible n
    = notDivisibleByTen n && allDigitsOdd (n + rn)
  where rn                   = reverseNumber n
        notDivisibleByTen !x = x `rem` 10 /= 0

countRange acc start end
    | start > end = acc
    | otherwise   = countRange (acc + v) (start + 1) end
  where v = if isReversible start then 1 else 0

main
    = print $ sum $ parMap rseq cr ranges
  where max       = 1000000000
        qmax      = max `div` 4
        ranges    = [(1, qmax), (qmax, qmax * 2), (qmax * 2, qmax * 3), (qmax * 3, max)]
        cr (s, e) = countRange 0 s e

最佳答案

就目前而言，ghc-7.6.1 为 foldDigits 生成的核心(与 -O2 )是

Rec {
$wfoldDigits_r2cK
  :: forall a_aha.
     (a_aha -> GHC.Types.Int -> a_aha)
     -> a_aha -> GHC.Prim.Int# -> a_aha
[GblId, Arity=3, Caf=NoCafRefs, Str=DmdType C(C(S))SL]
$wfoldDigits_r2cK =
  \ (@ a_aha)
    (w_s284 :: a_aha -> GHC.Types.Int -> a_aha)
    (w1_s285 :: a_aha)
    (ww_s288 :: GHC.Prim.Int#) ->
    case w1_s285 of acc_Xhi { __DEFAULT ->
    let {
      ds_sNo [Dmd=Just D(D(T)S)] :: (GHC.Types.Int, GHC.Types.Int)
      [LclId, Str=DmdType]
      ds_sNo =
        case GHC.Prim.quotRemInt# ww_s288 10
        of _ { (# ipv_aJA, ipv1_aJB #) ->
        (GHC.Types.I# ipv_aJA, GHC.Types.I# ipv1_aJB)
        } } in
    case w_s284 acc_Xhi (case ds_sNo of _ { (d_arS, m_Xsi) -> m_Xsi })
    of i_ahg { __DEFAULT ->
    case GHC.Prim.<# ww_s288 10 of _ {
      GHC.Types.False ->
        case ds_sNo of _ { (d_Xsi, m_Xs5) ->
        case d_Xsi of _ { GHC.Types.I# ww1_X28L ->
        $wfoldDigits_r2cK @ a_aha w_s284 i_ahg ww1_X28L
        }
        };
      GHC.Types.True -> i_ahg
    }
    }
    }
end Rec }

如您所见，它重新装箱了 quotRem 的结果称呼。问题是 f 没有属性在这里可用，作为递归函数， foldDigits不能内联。

使用手动 worker 包装器转换使函数参数静态，

foldDigits :: (a -> Int -> a) -> a -> Int -> a
foldDigits f = go
  where
    go !acc 0 = acc
    go acc n = case n `quotRem` 10 of
                 (q,r) -> go (f acc r) q

foldDigits变为可内联的，您将获得专门的版本，供您对未装箱数据进行操作，但没有顶级 foldDigits ，例如

Rec {
$wgo_r2di :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType LL]
$wgo_r2di =
  \ (ww_s28F :: GHC.Prim.Int#) (ww1_s28J :: GHC.Prim.Int#) ->
    case ww1_s28J of ds_XJh {
      __DEFAULT ->
        case GHC.Prim.quotRemInt# ds_XJh 10
        of _ { (# ipv_aJK, ipv1_aJL #) ->
        $wgo_r2di (GHC.Prim.+# (GHC.Prim.*# ww_s28F 10) ipv1_aJL) ipv_aJK
        };
      0 -> ww_s28F
    }
end Rec }

并且对计算时间的影响是有形的，对于原始的，我得到了

$ ./eul145 +RTS -s -N2
608720
1,814,289,579,592 bytes allocated in the heap
     196,407,088 bytes copied during GC
          47,184 bytes maximum residency (2 sample(s))
          30,640 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     1827331 colls, 1827331 par   23.77s   11.86s     0.0000s    0.0041s
  Gen  1         2 colls,     1 par    0.00s    0.00s     0.0001s    0.0001s

  Parallel GC work balance: 54.94% (serial 0%, perfect 100%)

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)

  SPARKS: 4 (3 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time  620.52s  (313.51s elapsed)
  GC      time   23.77s  ( 11.86s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time  644.29s  (325.37s elapsed)

  Alloc rate    2,923,834,808 bytes per MUT second

(我使用 -N2 因为我的 i5 只有两个物理内核)，vs.

$ ./eul145 +RTS -s -N2
608720
  16,000,063,624 bytes allocated in the heap
         403,384 bytes copied during GC
          47,184 bytes maximum residency (2 sample(s))
          30,640 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     15852 colls, 15852 par    0.34s    0.17s     0.0000s    0.0037s
  Gen  1         2 colls,     1 par    0.00s    0.00s     0.0001s    0.0001s

  Parallel GC work balance: 43.86% (serial 0%, perfect 100%)

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)

  SPARKS: 4 (3 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time  314.85s  (160.08s elapsed)
  GC      time    0.34s  (  0.17s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time  315.20s  (160.25s elapsed)

  Alloc rate    50,817,657 bytes per MUT second

  Productivity  99.9% of total user, 196.5% of total elapsed

与修改。运行时间大约减半，分配减少了 100 倍。

关于performance - 如何优化可以完全严格的循环，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13252992/

文章推荐： macos - 无法在 Mac OS X 上安装 scipy

文章推荐：使用 gRPC 的 TCP session

文章推荐： tridion - 为内容交付 Web 服务配置环境数据框架时出错

performance - "performant"软件究竟是什么意思？
关闭。这个问题是opinion-based .它目前不接受答案。想改善这个问题吗？更新问题，以便可以通过 editing this post 用事实和引文回答问题. 8年前关闭。 Improve t
performance - 灿灿授权: Performance Issue
暂时忘记能力的定义，只关注能力的“检查”(使用“授权!”)，我看到 CanCan 添加了大约 400 毫秒，用于简单地检查用户是否具有特定的能力主题/模型。这是预期的吗(我假设不是)？或者，有没有可
performance - Swift 显式与推断类型 : Performance
我正在阅读有关 Swift 的教程 ( http://www.raywenderlich.com/74438/swift-tutorial-a-quick-start )，它预定义为不显式设置类型，因
performance - 编码优先级 : Performance, 可维护性、可重用性？
这主要是由于对 SQL 问题的回答。由于性能原因，有意省略了 UDF 和子查询。我没有包括可靠性并不是说它应该被视为理所当然，但代码必须工作。性能永远是第一位的吗？提供了许多以性能为主要优先事项的答
performance - Scala递归与循环: performance and runtime considerations
我已经编写了一个简单的测试平台来测量三种阶乘实现的性能:基于循环的，非尾递归的和尾递归的。 Surprisingly to me the worst performant was the loop o
performance - ui-performance 插件无法在开发模式下工作 (Grails)
我已将 ui-performance 插件应用到我的应用程序中。不幸的是，在开发模式下运行应用程序时它似乎不起作用。例如，我的 javascript 导入是用“vnull”版本呈现的。例如不会
performance - 编译 F# 引用 : performance?
我有一个我操作的 F# 引用(我在各处添加对象池以回收经常创建和删除的短期对象)。我想运行结果报价；现在我使用了 F# PowerPack，它提供了将引用转换为表达式树和委托(delegate)的方法
performance - Spark独立: SparklyR : Performance issues
我正在尝试在 Spark 服务器上运行 SparklyR 库中的机器学习算法。 1 个簇 8 核 24G内存 Ubuntu 16.04 星火2.2 独立配置 1名师傅/2名 worker 每个执行器的
performance - 架构和索引以及主键 : Differences in lookup performance?
我有一个数据库(准确地说是在 postgres 上运行)，具有以下结构: user1 (schema) | - cars (table) - airplanes (table, again) .
performance - iOS/核心动画 : Performance tuning
我的应用程序在我的 iPad 上运行。但它的表现非常糟糕——我的速度低于 15fps。谁能帮我优化一下？它基本上是一个轮子(派生自 UIView)，包含 12 个按钮(派生自 UIControl)。
performance - coursera progfun1 : scala union performance
在完成“Scala 中的函数式编程原则”@coursera 类(class)第 3 周的作业时，我发现当我实现视频类(class)中所示的函数联合时: override def union(tha
performance - Symfony2 依赖注入(inject) : performances impact
我正在重构我的一个 Controller 以使其成为一项服务，我想知道不将整个服务容器注入(inject)我的 Controller 是否会对性能产生影响。这样效率更高吗: innova.path.
performance - facelet tag performance
我有一个要显示的内容很大的文件。例如在显示用户配置文件时，中的每个 EL 表达式需要一个 userId 作为 bean 的参数，该参数取自 session 上下文。我在 xhtml 文件中将这个 u
performance - OpenGL/DirectX : How does Mipmapping improve performance?
我非常了解 mipmapping。我不明白(在硬件/驱动程序级别)是 mipmapping 如何提高应用程序的性能(至少这是经常声称的)。在执行片段着色器之前，驱动程序不知道要访问哪个 mipmap
performance - Scala 惰性值 : performance penalty? 线程安全？
这个问题在这里已经有了答案: 10年前关闭。 Possible Duplicate: What's the (hidden) cost of lazy val? (Scala) Scala 允许定义惰
java - build().perform() 和 Perform() 之间有什么区别
一些文章建议现在 build() 包含在 perform() 本身中，而其他人则建议当要链接多个操作时使用 build().perform()一起。最佳答案 build() 包含在 perform(
performance - postgres 函数 : when does IMMUTABLE hurt performance?
Postgres docs说 For best optimization results, you should label your functions with the strictest vol
performance - 零成本抽象 : performance of for-loop vs. 迭代器
阅读Zero-cost abstractions看着 Introduction to rust: a low-level language with high-level abstractions我尝
performance - MQ : CPU Performance 上的 SSL
我想在 MQ 服务器上部署 SSL，但我想知道我当前的 CPU 容量是否支持 SSL。 (我没有预算增加 CPU 内核和 MQ PVU 的数量) 我的规范: Windows 2003 服务器 SP2，
performance - Chrome Performance Profiler 中的“Timings”选项卡丢失
因此，我在 Chrome 开发者工具的性能选项卡内的时间部分成功地监控了我的 React Native 应用程序的性能。突然在应用程序的特定重新加载时，Timings 标签丢失。我已尝试重置

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

performance - 如何优化可以完全严格的循环