gpt4 book ai didi

haskell - 从 "Real World Haskell?"实现 splitWith 的正确方法是什么

转载 作者:行者123 更新时间:2023-12-03 12:33:43 25 4
gpt4 key购买 nike

我一直在通过真实世界的 Haskell 工作,并尝试做练习。我设法实现了第 4.5 章练习 2 中的 splitWith 的工作版本。我觉得这不是一种非常 Haskell 的做事方式。必须用累加器实现一个新功能似乎很迂回。有没有更惯用的方法来做到这一点,比如折叠?我查看了 foldl 的文档,但我对如何操作感到头疼。

splitWith :: (a -> Bool) -> [a] -> [[a]]
splitWith _ [] = []
splitWith f a = splitWithAcc f a []
where
splitWithAcc :: (a -> Bool) -> [a] -> [[a]] -> [[a]]
splitWithAcc f xs acc
| null xs = acc
| f $ head xs = splitWithAcc f (dropWhile f xs) (acc ++ [takeWhile f xs])
| otherwise = splitWithAcc f (tail xs) acc

澄清

以下是练习的正文:

Write a function splitWith that acts similarly to words but takes a predicate and a list of any type, and then splits its input list on every element for which the predicate returns False:

最佳答案

递归是你的 friend ,但我会做一些不同的事情。首先,当我 split 时,我会让我的条件为真,而不是让它为假。其次,我会利用 Data.List 中的一个方便的函数叫 break

> :t break
break :: (a -> Bool) -> [a] -> ([a], [a])
> break (== ' ') "This is a test"
("This", " is a test")

我将使用它定义我的函数
splitWith' :: (a -> Bool) -> [a] -> [[a]]
splitWith' cond [] = []
splitWith' cond xs = first : splitWith' cond (safeTail rest)
where
(first, rest) = break cond xs
-- Need this function to handle an empty list
safeTail [] = []
safeTail (_:ys) = ys

或者,如果你想把它写得尽可能困惑
splitWith'' :: (a -> Bool) -> [a] -> [[a]]
splitWith'' _ [] = []
splitWith'' cond xs = uncurry (:) $ fmap (splitWith'' cond . safeTail) $ break cond xs
where
safeTail [] = []
safeTail (_:ys) = ys

这是有效的,因为 fmap over 2-tuples 将函数应用于元组的第二个元素。然后它解开 :并将其应用于第一个和其余的。

更新

如果您希望它在谓词为假时拆分,您可以使用 span而不是 break ,或者只是将其定义为
splitWithWeird cond xs = splitWith' (not . cond) xs

虽然第二个显然会产生稍微小的开销(除非编译器可以优化它)

更新 2

如果你想处理重复的字符,如果它适合你的需要,有一个简单、快速的解决方法:
> filter (not . null) $ splitWithWeird (/= ' ') "This  is   a    test"
["This","is","a","test"]

有了这样一个简单的修复,我们可能会想把它构建到算法本身中:
splitWithWeird :: (a -> Bool) -> [a] -> [[a]]
splitWithWeird cond [] = []
splitWithWeird cond xs = filter (not . null) $ first : splitWithWeird cond (safeTail rest)
where
(first, rest) = span cond xs
safeTail [] = []
safeTail (_:ys) = ys

但这将是一个坏主意。由于这是一个递归函数,您添加了对 filter (not . null) 的调用。在每个级别,所以在函数中的每个拆分位置。所有这些都必须在返回之前扫描整个列表,因此必须执行额外的检查。最好将它定义为一个单独的函数,以便 filter (not . null)只调用一次:
splitWithWeird' :: (a -> Bool) -> [a] -> [[a]]
splitWithWeird' cond xs = filter (not . null) $ splitWithWeird cond xs

或者,如果您希望将其内置到算法中:
splitWithWeird :: (a -> Bool) -> [a] -> [[a]]
splitWithWeird cond xs = filter (not . null) $ splitWithHelper cond xs
where
safeTail [] = []
safeTail (_:ys) = ys
splitWithHelper cond [] = []
splitWithHelper cond xs =
let (first, rest) = span cond xs
in first : splitWithHelper cond (safeTail rest)

这实际上只是在内部做与定义两个函数相同的事情。请注意,我必须使用附加的 let ... in ...声明在这里(我不喜欢嵌套 wheres)因为 (first, rest) = span cond xs属于 splitWithHelper , 不至 splitWithWeird .如果你把它留在 where 子句中,算法将不起作用。

更新 3

不想在这里只留下一个非理想的解决方案,我已经写了一个算法来分割子序列,而不仅仅是条件或元素。它确实使用了 first函数来自 Control.Arrow ,但只是为了使代码更加紧凑。
import Control.Arrow (first)

isPrefixOf :: Eq a => [a] -> [a] -> Bool
isPrefixOf [] _ = True
isPrefixOf _ [] = False
isPrefixOf (x:xs) (y:ys) = x == y && isPrefixOf xs ys

splitSubseq :: Eq a => [a] -> [a] -> [[a]]
splitSubseq sub [] = []
splitSubseq sub xs = initial : splitSubseq sub rest
where
lsub = length sub
splitter [] = ([], [])
splitter yss@(y:ys)
| isPrefixOf sub yss = ([], drop lsub yss)
| otherwise = first (y :) $ splitter ys
(initial, rest) = splitter xs

我并不是说这是一个有效的解决方案,但它应该很容易遵循。首先,我定义了一个函数 isPrefixOf如果第二个列表以第一个列表开头,则返回 True。

我想保持相同的递归模式( first : recursive rest ),所以我写了 splitter代替 spanbreak ,这就是 isPrefixOf进来。如果子序列是列表的前缀,则返回 ([], restAfterSubsequence) ,否则存储列表的第一个字符,然后从下一个元素开始重复此操作。我的使用 first这里只是为了让我可以递归和简洁地编写这个函数。它只适用 (y :)splitter 返回值的第一个元素.从 splitter 返回的元组的第二个元素只是尚未消耗的其余输入。

如果您有兴趣,这里是该算法的性能统计数据(使用 --make -O2 ,i5 quad 编译):
main = print $ sum $ take (10 ^ 7) $ map length $ splitSubseq " " $ cycle "Testing "

70000000
6,840,052,808 bytes allocated in the heap
2,032,868 bytes copied during GC
42,900 bytes maximum residency (2 sample(s))
22,636 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed) Avg pause Max pause
Gen 0 13114 colls, 0 par 0.06s 0.07s 0.0000s 0.0001s
Gen 1 2 colls, 0 par 0.00s 0.00s 0.0002s 0.0004s

TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

INIT time 0.00s ( 0.00s elapsed)
MUT time 3.68s ( 3.74s elapsed)
GC time 0.06s ( 0.07s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 3.74s ( 3.81s elapsed)

然后去嵌入求和和长度:
main = print $ sum $ take (10 ^ 7) $ map length $ repeat "Testing"

70000000
240,052,572 bytes allocated in the heap
12,812 bytes copied during GC
42,900 bytes maximum residency (2 sample(s))
22,636 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed) Avg pause Max pause
Gen 0 458 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.00s 0.00s 0.0001s 0.0001s

TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

INIT time 0.00s ( 0.00s elapsed)
MUT time 0.09s ( 0.09s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.11s ( 0.09s elapsed)

所以我们可以看到,这只需要大约 0.1 秒的时间,让我们有大约 3.64 秒的时间让这个算法拆分由 "Testing " 组成的字符串。重复 1000 万次,所有这些都使用了少量的内存。唯一的缺点是,当使用 -threaded 编译时,该算法实际上会显着变慢。并以更多内核运行。

关于haskell - 从 "Real World Haskell?"实现 splitWith 的正确方法是什么,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19386400/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com