gpt4 book ai didi

performance - Haskell Repa 模板技巧

转载 作者:行者123 更新时间:2023-12-04 03:11:01 24 4
gpt4 key购买 nike

问题

我试图了解 Repa工作,我从 Repa Examples 得到一个“模糊”示例代码。包裹。代码使用 stencil2 Quasi Quote :

[stencil2|   2  4  5  4  2
4 9 12 9 4
5 12 15 12 5
4 9 12 9 4
2 4 5 4 2 |]

这只是 TemplateHaskell片段,它生成一个函数:
makeStencil2 5 5 coeffs where
{-# INLINE[~0] coeffs #-}
coeffs = \ ix -> case ix of
Z :. -2 :. -2 -> Just 2
Z :. -2 :. -1 -> Just 4
Z :. -2 :. 0 -> Just 5
Z :. -2 :. 1 -> Just 4
Z :. -2 :. 2 -> Just 2
[...]
_ -> Nothing

可以使用 TH,但我希望将 coefs 保留在 Repa 数组中,因此我将代码更改为使用 Repa Array,但与原始代码相比,我的代码运行速度慢了 2 倍。

一些花哨的笔记

我注意到,Repa 作者使用硬编码的 7 x 7 值矩阵来获取系数:
http://hackage.haskell.org/package/repa-3.2.3.3/docs/src/Data-Array-Repa-Stencil-Dim2.html#forStencil2
(见:模板7x7)

问题
  • 我想问你为什么它没有像原来的那样优化,我们该如何解决?我想编写一个“卷积”函数,它允许我在图像上运行模板(Repa 数组)的卷积。
  • 我们真的必须使用这样的硬编码矩阵来让 GHC 优化代码吗?如果不使用这样的“黑客”,真的没有办法创建快速的 Haskell 代码吗?

  • 代码

    原始模糊功能:
    blur    :: Monad m => Int -> Array U DIM2 Double -> m (Array U DIM2 Double)
    blur !iterations arrInit
    = go iterations arrInit
    where go !0 !arr = return arr
    go !n !arr
    = do arr' <- computeP
    $ A.smap (/ 159)
    $ forStencil2 BoundClamp arr
    [stencil2| 2 4 5 4 2
    4 9 12 9 4
    5 12 15 12 5
    4 9 12 9 4
    2 4 5 4 2 |]
    go (n-1) arr'

    我的模糊功能:
    blur !iterations arrInit = go iterations arrInit
    where
    stencilx7 = fromListUnboxed (Z :. 7 :. 7)
    [ 0, 0, 0, 0, 0, 0, 0
    , 0, 2, 4, 5, 4, 2, 0
    , 0, 4, 9, 12, 9, 4, 0
    , 0, 5, 12, 15, 12, 5, 0
    , 0, 4, 9, 12, 9, 4, 0
    , 0, 2, 4, 5, 4, 2, 0
    , 0, 0, 0, 0, 0, 0, 0
    ] :: Array U DIM2 Int
    magicf (Z :. x :. y) = Just $ fromIntegral $ unsafeIndex stencilx7 (Z:. (x+3) :. (y+3))
    go !0 !arr = return arr
    go !n !arr
    = do
    arr' <- computeP
    $ A.smap (/ 159)
    $ A.forStencil2 BoundClamp arr
    $ makeStencil2 5 5 magicf
    go (n-1) arr'

    其余代码:
    {-# LANGUAGE PackageImports, BangPatterns, TemplateHaskell, QuasiQuotes #-}
    {-# OPTIONS -Wall -fno-warn-missing-signatures -fno-warn-incomplete-patterns #-}

    import Data.List
    import Control.Monad
    import System.Environment
    import Data.Word
    import Data.Array.Repa.IO.BMP
    import Data.Array.Repa.IO.Timing
    import Data.Array.Repa as A
    import qualified Data.Array.Repa.Repr.Unboxed as U
    import Data.Array.Repa.Stencil as A
    import Data.Array.Repa.Stencil.Dim2 as A
    import Prelude as P

    main
    = do args <- getArgs
    case args of
    [iterations, fileIn, fileOut] -> run (read iterations) fileIn fileOut
    _ -> usage

    usage = putStr $ unlines
    [ "repa-blur <iterations::Int> <fileIn.bmp> <fileOut.bmp>" ]


    -- | Perform the blur.
    run :: Int -> FilePath -> FilePath -> IO ()
    run iterations fileIn fileOut
    = do arrRGB <- liftM (either (error . show) id)
    $ readImageFromBMP fileIn

    arrRGB `deepSeqArray` return ()
    let (arrRed, arrGreen, arrBlue) = U.unzip3 arrRGB
    let comps = [arrRed, arrGreen, arrBlue]

    (comps', tElapsed)
    <- time $ P.mapM (process iterations) comps

    putStr $ prettyTime tElapsed

    let [arrRed', arrGreen', arrBlue'] = comps'
    writeImageToBMP fileOut
    (U.zip3 arrRed' arrGreen' arrBlue')


    process :: Monad m => Int -> Array U DIM2 Word8 -> m (Array U DIM2 Word8)
    process iterations
    = promote >=> blur iterations >=> demote
    {-# NOINLINE process #-}


    promote :: Monad m => Array U DIM2 Word8 -> m (Array U DIM2 Double)
    promote arr
    = computeP $ A.map ffs arr
    where {-# INLINE ffs #-}
    ffs :: Word8 -> Double
    ffs x = fromIntegral (fromIntegral x :: Int)
    {-# NOINLINE promote #-}


    demote :: Monad m => Array U DIM2 Double -> m (Array U DIM2 Word8)
    demote arr
    = computeP $ A.map ffs arr

    where {-# INLINE ffs #-}
    ffs :: Double -> Word8
    ffs x = fromIntegral (truncate x :: Int)

    编译: ghc -O2 -threaded -fllvm -fforce-recomp Main.hs -ddump-splices

    最佳答案

  • 从阵列中读取卷积系数理论上不能像在编译代码中直接焊接常数那样快,因为后一种方法在机器级别上不需要任何成本。
  • 不,GHC 能够分解任意大小的静态模板。见 my implementation fixed-vector 的静态卷积s 的 lambda:
    [dim2St| 1   2   1
    0 0 0
    -1 -2 -1 |]
    -->
    Dim2Stencil
    n3
    n3
    (VecList
    [VecList
    [\ acc a -> return (acc + a),
    \ acc a -> (return $ (acc + (2 * a))),
    \ acc a -> return (acc + a)],
    VecList
    [\ acc _ -> return acc,
    \ acc _ -> return acc,
    \ acc _ -> return acc],
    VecList
    [\ acc a -> return (acc - a),
    \ acc a -> (return $ (acc + (-2 * a))),
    \ acc a -> return (acc - a)]])
    (\ acc a reduce -> reduce acc a)
    (return 0)
  • 关于performance - Haskell Repa 模板技巧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19749343/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com