gpt4 book ai didi

algorithm - Haskell:优化图形处理算法

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:01:39 24 4
gpt4 key购买 nike

这是对 this post 的跟进, 代码现在基于 Structuring Depth-First Search Algorithms in Haskell to do depth first search ,由 King 和 Launchbury 在 1990 年代创作。该论文提出了一种生成和修剪策略,但使用了带有 State Monad 的可变数组(我怀疑某些语法已被弃用)。作者暗示集合可用于记住访问过的节点,作为额外 O(log n) 的成本。我尝试用一​​个集合来实现(我们现在拥有比 1990 年代更好的机器!),使用现代 State Monad 语法,并使用 Vectors 而不是数组(正如我读到的那样通常更好)。

和以前一样,我的代码在小型数据集上运行,但无法返回我需要分析的 5m 边图,我正在寻找关于大规模运行的弱点的提示 .我所知道的是代码可以在内存中轻松运行,所以这不是问题,但我是否无意中滑到了 O(n2)? (相比之下,这篇论文在 Data.Graph 库(我最近也从中借用了一些代码)中的官方实现使用了一个可变数组,但在大数据集上失败了……堆栈溢出!!!)

所以现在我有一个带有未完成的 IntSet 状态的 Vector 数据存储和一个带有崩溃的 ST Monad 数组“官方”数组! Haskell 应该能够做得比这更好?

import Data.Vector (Vector)
import qualified Data.IntSet as IS
import qualified Data.Vector as V
import qualified Data.ByteString.Char8 as BS
import Control.Monad.State

type Vertex = Int
type Table a = Vector a
type Graph = Table [Vertex]
type Edge = (Vertex, Vertex)
data Tree a = Node a (Forest a) deriving (Show,Eq)
type Forest a = [Tree a]
-- ghc -O2 -threaded --make
-- +RTS -Nx
generate :: Graph -> Vertex -> Tree Vertex
generate g v = Node v $ map (generate g) (g V.! v)

chop :: Forest Vertex -> State IS.IntSet (Forest Vertex)
chop [] = return []
chop (Node x ts:us) = do
visited <- contains x
if visited then
chop us
else do
include x
x1 <- chop ts
x2 <- chop us
return (Node x x1:x2)

prune :: Forest Vertex -> State IS.IntSet (Forest Vertex)
prune vs = chop vs

main = do
--edges <- V.fromList `fmap` getEdges "testdata.txt"
edges <- V.fromList `fmap` getEdges "SCC.txt"
let
-- calculate size of five largest SCC
maxIndex = fst $ V.last edges
gr = buildG maxIndex edges
sccRes = scc gr
big5 = take 5 sccRes
big5' = map (\l -> length $ postorder l) big5
putStrLn $ show $ big5'

contains :: Vertex -> State IS.IntSet Bool
contains v = state $ \visited -> (v `IS.member` visited, visited)

include :: Vertex -> State IS.IntSet ()
include v = state $ \visited -> ((), IS.insert v visited)


getEdges :: String -> IO [Edge]
getEdges path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let pairs = (map . map) (maybe (error "can't read Int") fst . BS.readInt) lines
return [(a, b) | [a, b] <- pairs]

vertices :: Graph -> [Vertex]
vertices gr = [1.. (V.length gr - 1)]

edges :: Graph -> [Edge]
edges g = [(u,v) | u <- vertices g, v <- g V.! u]

-- accumulate :: (a -> b -> a) -> Vector a-> Vector (Int, b)--> Vector a
-- accumulating function f
-- initial vector (of length m)
-- vector of index/value pairs (of length n)
buildG :: Int -> Table Edge -> Graph
buildG maxIndex edges = graph' where
graph = V.replicate (maxIndex + 1) []
--graph' = V.accumulate (\existing new -> new:existing) graph edges
-- flip f takes its (first) two arguments in the reverse order of f
graph' = V.accumulate (flip (:)) graph edges

mapT :: Ord a => (Vertex -> a -> b) -> Table a -> Table b
mapT = V.imap

outDegree :: Graph -> Table Int
outDegree g = mapT numEdges g
where numEdges v es = length es

indegree :: Graph -> Table Int
indegree g = outDegree $ transposeG g

transposeG :: Graph -> Graph
transposeG g = buildG (V.length g - 1) (reverseE g)

reverseE :: Graph -> Table Edge
reverseE g = V.fromList [(w, v) | (v,w) <- edges g]

-- --------------------------------------------------------------

postorder :: Tree a -> [a]
postorder (Node a ts) = postorderF ts ++ [a]

postorderF :: Forest a -> [a]
postorderF ts = concat (map postorder ts)

postOrd :: Graph -> [Vertex]
postOrd g = postorderF (dff g)

dfs :: Graph -> [Vertex] -> Forest Vertex
dfs g vs = map (generate g) vs

dfs' :: Graph -> [Vertex] -> Forest Vertex
dfs' g vs = fst $ runState (prune d) $ IS.fromList []
where d = dfs g vs

dff :: Graph -> Forest Vertex
dff g = dfs' g $ reverse (vertices g)

scc :: Graph -> Forest Vertex
scc g = dfs' g $ reverse $ postOrd (transposeG g)

最佳答案

一些可能的小改进:

改变

type Edge = (Vertex, Vertex)

data Edge = Edge {-# UNPACK #-} !Vertex {-# UNPACK #-} !Vertex

重用每条边的内存使用,从 7 个字减少到 3 个字,并改进缓存局部性。减少内存压力几乎总是可以提高运行时间。正如 @jberryman 提到的,可以为 Table Edge 使用未装箱的向量(这样您就不需要上述自定义数据类型)。

generate :: Graph -> Vertex -> Tree Vertex
generate g v = Node v $ map (generate g) (g V.! v)

如果您确定索引在边界内,您可以使用 vector 中的不安全索引函数而不是 .!

contains :: Vertex -> State IS.IntSet Bool
contains v = state $ \visited -> (v `IS.member` visited, visited)

改为使用 getput $! 的组合。

include :: Vertex -> State IS.IntSet ()
include v = state $ \visited -> ((), IS.insert v visited)

改用modify'

您在程序中使用了很多列表。链表不是内存/缓存效率最高的数据结构。看看您是否可以转换代码以使用更多向量。

关于algorithm - Haskell:优化图形处理算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24370976/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com