gpt4 book ai didi

list - Haskell:搜索大尺寸列表的最佳方式

转载 作者:行者123 更新时间:2023-12-05 01:37:10 25 4
gpt4 key购买 nike

我有一个 100K+ 元素的列表,这是一个有限列表。目前我正在使用 Data.List 函数 elem。在查看 Data.List 信息页面时,还有查找和过滤器。其中之一会比 elem 函数更快吗?

最佳答案

以防万一我们还没有完全击败死马......

不同的集合表示存在巨大的性能差异。作为一个示例(可能与您的用例匹配也可能不匹配),请考虑采用 200K 个随机元素的列表和计算来确定 200 个随机元素的成员资格。

我已经实现了三种明显的方法来做到这一点 - 使用 elem在列表中,转换为 HashSet检查成员资格,并执行布隆过滤器和哈希集的混合。基准测试显示列表解决方案比哈希集慢 3 个数量级,比混合慢约 2 倍。

benchmarking list
mean: 460.7106 ms, lb 459.2952 ms, ub 462.8491 ms, ci 0.950
std dev: 8.741096 ms, lb 6.293703 ms, ub 12.23082 ms, ci 0.950

benchmarking hashset
mean: 175.2730 us, lb 173.9140 us, ub 177.0802 us, ci 0.950
std dev: 7.966790 us, lb 6.391454 us, ub 10.25774 us, ci 0.950

benchmarking bloom+hashset
mean: 88.22402 us, lb 87.35856 us, ub 89.66884 us, ci 0.950
std dev: 5.642663 us, lb 3.793715 us, ub 8.264222 us, ci 0.950

和代码:
import qualified Data.HashSet as Set
import Data.HashSet (Set)
import qualified Data.BloomFilter as BF
import qualified Data.BloomFilter.Easy as BF
import Data.BloomFilter (Bloom)
import Data.BloomFilter.Hash as H2
import Data.Hashable as H1
import Criterion.Main
import System.Random

data MySet a = MS (Set a) (Bloom a)

fromList :: (H2.Hashable a, H1.Hashable a, Ord a) => [a] -> MySet a
fromList as =
let hs = Set.fromList as
bf = BF.easyList 0.2 as
in hs `seq` bf `seq` MS hs bf

member :: (H2.Hashable a, H1.Hashable a, Ord a) => a -> MySet a -> Bool
member e (MS hs bf)
| BF.elemB e bf = Set.member e hs
| otherwise = False

main = do
list <- take 200000 `fmap` randomsIO :: IO [Int]
xs <- take 200 `fmap` randomsIO
let hs = Set.fromList list
bhs = fromList list
defaultMain
[ bench "list" $ nf (map (`elem` list)) xs
, bench "hashset" $ nf (map (`Set.member` hs)) xs
, bench "bloom+hashset" $ nf (map (`member` bhs)) xs
]

randomsIO = randoms `fmap` newStdGen

关于list - Haskell:搜索大尺寸列表的最佳方式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22638997/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com