gpt4 book ai didi

sesame - 为什么芝麻限制为 150 米的三元组?

转载 作者:行者123 更新时间:2023-12-01 05:02:10 25 4
gpt4 key购买 nike

我不会确切地说它是有限的,但只要我能看到给出的建议是那种“如果你需要超越它,你可以改变后端存储......”。为什么?为什么芝麻在超过 150-200 米三元组时不如说 OWLIM 或 Allegrgraph 那样高效。为了达到那么大,实现了哪些优化?底层数据结构是否不同?

最佳答案

@Jeen Broekstra 在这里回答: http://answers.semanticweb.com/questions/21881/why-is-sesame-limited-to-lets-say-150m-triples

  1. the actual values that make up an RDF statements (that is, the subjects, predicates, and objects) are indexed in a relatively simple hash, mapping integer ids to actual data values. This index does a lot of in-memory caching to speed up lookups but as the size of the store increases, the probability (during insertion or lookup) that a value is not present in the cache and needs to be retrieved from disk increases, and in addition the on-disk lookup itself becomes more expensive as the size of the hash increases.
  2. data retrieval in the native store has been balanced to make optimal use of the file system page size, for maximizing retrieval speed of B-tree nodes. This optimization relies on consecutive lookups reusing the same data block so that the OS-level page cache can be reused. This heuristic start failing more often as transaction sizes (and therefore B-trees) grow, however.
  3. as B-trees grow in size, the chances of large cascading splits increase.

关于sesame - 为什么芝麻限制为 150 米的三元组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15723851/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com