gpt4 book ai didi

git - git binary diff算法(增量存储)是否标准化?

转载 作者:IT王子 更新时间:2023-10-29 00:44:03 24 4
gpt4 key购买 nike

Git 使用增量压缩来存储彼此相似的对象。

此算法是否标准化并用于其他工具?是否有描述格式的文档?它与 xdelta/VCDIFF/RFC 3284 兼容吗?

最佳答案

我认为差异算法用于 pack files链接到 delta encoding 之一在那里:initially (2005) xdelta , 然后 libXDiff .
但随后,如下所述,它转向了自定义实现。

无论如何,作为mentioned here :

Git does deltification only in packfiles.
But when you push via SSH git would generate a pack file with commits the other side doesn't have, and those packs are thin packs, so they also have deltas... but the remote side then adds bases to those thin packs making them standalone.

(注意:创建许多包文件,或在巨大的包文件中检索信息是昂贵的,并解释为什么 git 不能很好地处理巨大的文件或巨大的 repo。
在“git with large files”中查看更多信息)

This thread也提醒我们:

Actually packfiles and deltification (LibXDiff, not xdelta) was, from what I remember and understand, originally because of network bandwidth (which is much more costly than disk space), and I/O performance of using single mmapped file instead of very large number of loose objects.

2008 thread 中提到了 LibXDiff .

然而,从那时起,算法已经进化,可能是自定义算法,如 2011 thread illustrates , 并作为 diff-delta.c 的标题指出:

So, strictly speaking, the current code in Git doesn't bear any resemblance with the libxdiff code at all.
However the basic algorithm behind both implementations is the same
.
Studying the libxdiff version is probably easier in order to gain an understanding of how this works.

/*
* diff-delta.c: generate a delta between two buffers
*
* This code was greatly inspired by parts of LibXDiff from Davide Libenzi
* http://www.xmailserver.org/xdiff-lib.html
*
* Rewritten for GIT by Nicolas Pitre <nico@fluxnic.net>, (C) 2005-2007
*/

更多关于 packfiles the Git Book :

packfile format


Git 2.18 adds to the delta description在这个新的documentation section ,现在(2018 年第二季度)指出:

Object types

Valid object types are:

  • OBJ_COMMIT (1)
  • OBJ_TREE (2)
  • OBJ_BLOB (3)
  • OBJ_TAG (4)
  • OBJ_OFS_DELTA (6)
  • OBJ_REF_DELTA (7)

Type 5 is reserved for future expansion. Type 0 is invalid.

Deltified representation

Conceptually there are only four object types: commit, tree, tag and blob.
However to save space, an object could be stored as a "delta" of another "base" object.
These representations are assigned new types ofs-delta and ref-delta, which is only valid in a pack file.

Both ofs-delta and ref-delta store the "delta" to be applied to another object (called 'base object') to reconstruct the object.
The difference between them is,

  • ref-delta directly encodes 20-byte base object name.
    • If the base object is in the same pack, ofs-delta encodes the offset of the base object in the pack instead.

The base object could also be deltified if it's in the same pack.
Ref-delta can also refer to an object outside the pack (i.e. the so-called "thin pack"). When stored on disk however, the pack should be self contained to avoid cyclic dependency.

The delta data is a sequence of instructions to reconstruct an object from the base object.
If the base object is deltified, it must be converted to canonical form first. Each instruction appends more and more data to the target object until it's complete.
There are two supported instructions so far:

  • one for copy a byte range from the source object and
  • one for inserting new data embedded in the instruction itself.

Each instruction has variable length. Instruction type is determined by the seventh bit of the first octet. The following diagrams follow the convention in RFC 1951 (Deflate compressed data format).

关于git - git binary diff算法(增量存储)是否标准化?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9478023/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com