gpt4 book ai didi

cassandra - 分层压缩策略如何确保90%的读取来自一个稳定状态

转载 作者:行者123 更新时间:2023-12-04 03:39:11 25 4
gpt4 key购买 nike

我试图了解Cassandra中的“分层压缩策略”如何工作,以确保单个sstable可以满足所有读取的90%。

从DataStax Doc:

new sstables are added to the first level, L0, and immediately compacted with the sstables in L1. When L1 fills up, extra sstables are promoted to L2. Subsequent sstables generated in L1 will be compacted with the sstables in L2 with which they overlap.

最佳答案

Cassandra中的LeveledCompactionStrategy(LCS)实现了LevelDB的内部。您可以在LevelDB implementation doc中检查确切的实现细节。
为了给您一个简单的解释,请考虑以下几点:

  • 当达到固定(相对较小)的大小限制时,将创建每个sstable。默认情况下,L0获取5MB的文件文件,每个后续级别是大小的10倍。 (在L1中,您将拥有50MB的数据,L2中将拥有500MB的数据,依此类推)。
  • 创建稳定表,确保它们不重叠
  • 填满某个级别时,将触发压缩,并将稳定级从L级提升到L + 1级。因此,在L1中,约10个文件中有50MB,L2中约100个文件中有500MB,等等。

  • 以粗体显示的是相关细节,这些细节可以证明从同一文件中读取90%的数据(稳定)。让我们一起做数学,一切都会变得更加清晰(我希望:)
    假设您在L0中有键A,B,C,D,E,每个键占用1MB的数据。
    接下来,我们插入键F。由于填充了0级,压缩将创建一个文件,该文件的1级为[A,B,C,D,E],而F将保留为0级。
    这是L1的1个文件中约83%的数据
    接下来,我们插入G,H,I,J和K。因此L0再次填满,L1用[I,G,H,I,J]获得新的稳定值。
    到目前为止,L0中有K,L1中有[A,B,C,D,E]和[I,G,H,I,J]
    大约是L1 中数据的90%:)
    如果我们继续插入键,我们将得到相同的行为,这就是为什么您从大致相同的文件/sstable中获得90%的读取结果的原因。
    在本段中,我提到的链接提供了更深入,更详细的信息(更新和逻辑删除会发生什么)(压缩选择的大小不同,因为它们是LevelDB的默认值,而不是C *):

    When the size of level L exceeds its limit, we compact it in a background thread. The compaction picks a file from level L and all overlapping files from the next level L+1. Note that if a level-L file overlaps only part of a level-(L+1) file, the entire file at level-(L+1) is used as an input to the compaction and will be discarded after the compaction. Aside: because level-0 is special (files in it may overlap each other), we treat compactions from level-0 to level-1 specially: a level-0 compaction may pick more than one level-0 file in case some of these files overlap each other.

    A compaction merges the contents of the picked files to produce a sequence of level-(L+1) files. We switch to producing a new level-(L+1) file after the current output file has reached the target file size (2MB). We also switch to a new output file when the key range of the current output file has grown enough to overlap more then ten level-(L+2) files. This last rule ensures that a later compaction of a level-(L+1) file will not pick up too much data from level-(L+2).


    希望这可以帮助!

    关于cassandra - 分层压缩策略如何确保90%的读取来自一个稳定状态,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29766453/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com