gpt4 book ai didi

sqlite - 嵌入式键值数据库与每个键只存储一个文件?

转载 作者:IT王子 更新时间:2023-10-29 02:02:08 27 4
gpt4 key购买 nike

我对嵌入式键值数据库相对于每个键只在磁盘上存储一个文件的天真的解决方案的优势感到困惑。例如,RocksDB、Badger、SQLite 等数据库使用 B+ 树和 LSM 等奇特的数据结构,但似乎获得与这个简单解决方案大致相同的性能。

例如,Badger(最快的 Go 嵌入式数据库)需要 about 800 microseconds写一个条目。相比之下,从头开始创建一个新文件并向其中写入一些数据需要 150 个麦克风,并且没有进行优化。

编辑:澄清一下,这是我与最先进的嵌入式数据库进行比较的键值存储的简单实现。只需将每个键散列为一个字符串文件名,并将关联的值作为字节数组存储在该文件名中。读取和写入各约为 150 个麦克风,这对于单次操作比 Badger 更快,并且对于批处理操作而言具有可比性。此外,磁盘空间是最小的,因为除了实际值之外我们不存储任何额外的结构。

我一定在这里遗漏了一些东西,因为人们实际使用的解决方案非常花哨,并且使用诸如布隆过滤器和 B+ 树之类的东西进行了优化。

最佳答案

但 Badger 并不是要编写“一个”条目:

My writes are really slow. Why?

Are you creating a new transaction for every single key update? This will lead to very low throughput.

To get best write performance, batch up multiple writes inside a transaction using single DB.Update() call.
You could also have multiple such DB.Update() calls being made concurrently from multiple goroutines.

这导致 issue 396 :

I was looking for fast storage in Go and so my first try was BoltDB. I need a lot of single-write transactions. Bolt was able to do about 240 rq/s.

I just tested Badger and I got a crazy 10k rq/s. I am just baffled

那是因为:

LSM tree has an advantage compared to B+ tree when it comes to writes.
Also, values are stored separately in value log files so writes are much faster.

You can read more about the design here.


要点之一(很难通过简单的文件读/写来复制)是:

Key-Value separation

The major performance cost of LSM-trees is the compaction process. During compactions, multiple files are read into memory, sorted, and written back. Sorting is essential for efficient retrieval, for both key lookups and range iterations. With sorting, the key lookups would only require accessing at most one file per level (excluding level zero, where we’d need to check all the files). Iterations would result in sequential access to multiple files.

Each file is of fixed size, to enhance caching. Values tend to be larger than keys. When you store values along with the keys, the amount of data that needs to be compacted grows significantly.

In Badger, only a pointer to the value in the value log is stored alongside the key. Badger employs delta encoding for keys to reduce the effective size even further. Assuming 16 bytes per key and 16 bytes per value pointer, a single 64MB file can store two million key-value pairs.

关于sqlite - 嵌入式键值数据库与每个键只存储一个文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49744812/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com