gpt4 book ai didi

caching - 为什么 skiplist 内存局部性差但平衡树好?

转载 作者:可可西里 更新时间:2023-11-01 10:58:14 24 4
gpt4 key购买 nike

曾经有人质疑antirez(Redis 的作者)为什么Redis 在ycombinator 中使用跳跃列表来实现排序集。 :

I was looking at Redis yesterday and noticed this. Is there any particular reason you chose skip list instead of btrees except for simplicity? Skip lists consume more memory in pointers and are generally slower than btrees because of poor memory locality so traversing them means lots of cache misses. I also suggested a way to improve throughput when you guarantee each command's durability (at the end of the wiki page): http://code.google.com/p/redis/wiki/AppendOnlyFileHowto Also, have you thought about accommodating read-only traffic in an additional thread as a way to utilize at least two cores efficiently while sharing the same memory?

然后antirez回答:

There are a few reasons: 1) They are not very memory intensive. It's up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees. 2) A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees. 3) They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code. About the Append Only durability & speed, I don't think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway. About threads: our experience shows that Redis is mostly I/O bound. I'm using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores), and using the "Redis Cluster" solution that I plan to develop in the future.

我仔细阅读了,但我不明白为什么跳跃列表的内存局部性差?为什么平衡树会导致良好的内存位置?


在我看来,内存局部性就是将数据存储在一个连续的内存中。我认为当读取地址x 中的数据时,CPU 会将地址x+1 中的数据加载到缓存中(基于多年前C 的一些实验)。因此遍历数组将导致缓存命中的可能性很高,我们可以说数组具有良好的内存局部性。

但是跳表和平衡树都不是数组,也不是连续存储数据的。所以我认为他们的内存位置都很差。那么有人可以为我解释一下吗?

最佳答案

也许那家伙的意思是在跳过列表节点(在默认实现的情况下)只有一个键值,而在b-tree 节点有N 线性布局。所以我们可以从 node 加载一堆 b-tree 键到缓存中。

你说过:

both aren't arrays and don't store data continuously

但我们有。我们将数据连续存储在 b-tree 节点

关于caching - 为什么 skiplist 内存局部性差但平衡树好?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35647064/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com