I'm trying to understand the internals of databases and why certain design decisions were taken when I had a doubt.
我试图了解数据库的内部结构,以及为什么在我有疑问时会做出某些设计决定。
Let's assume that the only requirement is to get the value given a key. No other access patterns to be supported.
让我们假设唯一的要求是获得给定了键的值。不支持其他访问模式。
If this is the scenario, why not just use one file per key in disk instead of going the traditional LSM Tree + SS Table approach? Inserts would be O(1) since you create a new file and search will also be O(1) since we know if the file is present in the disk or not.
如果是这种情况,为什么不使用磁盘中的每个键一个文件,而采用传统的LSM树+SS表方法呢?插入将是O(1),因为您创建了一个新文件,搜索也将是O(1),因为我们知道该文件是否存在于磁盘中。
I understand there must be some reason to not use this approach, but I'm not able to visualise what that reason would be.
我明白不使用这种方法肯定有某种原因,但我无法想象会是什么原因。
One reason I could think of is that data in disk is stored in blocks and blocks when retrieved from disk are cached. Now, in most cases, this reduces disk I/O. Seeking in an already open file is also faster than fetching the file from disk, but again, this is optimal only in average cases.
我能想到的一个原因是,磁盘中的数据存储在块中,从磁盘检索到的块被缓存。现在,在大多数情况下,这会减少磁盘I/O。在已打开的文件中查找也比从磁盘获取文件更快,但同样,这仅在一般情况下才是最佳的。
In worst cases, since the number of SS Table files will be much lower than the number of files when storing one file per key, Disk I/O will still be lower.
在最坏的情况下,由于每个键存储一个文件时SS表文件的数量将远远低于文件数量,因此磁盘I/O仍然较低。
Are there any other reasons why we don't store one file per key other than Disk I/O?
除了磁盘I/O之外,我们为什么不为每个密钥存储一个文件,还有其他原因吗?
更多回答
我是一名优秀的程序员,十分优秀!