gpt4 book ai didi

git - 加速 `git blame` 在有很多提交的存储库上

转载 作者:行者123 更新时间:2023-12-04 02:44:31 25 4
gpt4 key购买 nike

我正在尝试 git blame以下文件(在我的本地机器上运行)因为它太慢而无法生成 GitHub 的错误:

https://github.com/Homebrew/homebrew-core/blob/master/Formula/sqlite.rb

但是在本地运行也很慢,在我的机器上运行一分钟以上

time git --no-pager blame Formula/sqlite.rb > /dev/null

存储库包含超过 150K 的提交。

有没有办法加快 git blame命令?

最佳答案

在 Git 2.27(2020 年第二季度)中,“git blame”学会利用“changed-paths Bloom filter 存储在提交图文件中,和 introduced with git log .
commit 1b4c57f , commit 24b7d1e , commit fe88f9f (2020 年 4 月 23 日)来自 Jeff King ( peff ) .
commit 0906ac2 , commit b23ea97 , commit 8918e37 (2020 年 4 月 16 日) 来自 Derrick Stolee ( derrickstolee ) .
(由 Junio C Hamano -- gitster -- merge 在 commit 6d56d4c 中,2020 年 5 月 1 日)

blame: use changed-path Bloom filters

Signed-off-by: Derrick Stolee


The changed-path Bloom filters help reduce the amount of tree parsing required during history queries.

Before calculating a diff, we can ask the filter if a path changed between a commit and its first parent.

  • If the filter says "no" then we can move on without parsing trees.
  • If the filter says "maybe" then we parse trees to discover if the answer is actually "yes" or "no".

When computing a blame, there is a section in find_origin() that computes a diff between a commit and one of its parents.
When this is the first parent, we can check the Bloom filters before calling diff_tree_oid().

In order to make this work with the blame machinery, we need to initialize a struct bloom_key with the initial path. But also, we need to add more keys to a list if a rename is detected. We then check to see if any of these keys answer "maybe" in the diff.

If a user requests copy detection using "git blame -C", then there are more places where the set of "important" files can expand. I do not know enough about how this happens in the blame machinery.
Thus, the Bloom filter integration is explicitly disabled in this mode.
A later change could expand the bloom_key data with an appropriate call (or calls) to add_bloom_key().

Generally, this is a performance enhancement and should not change the behavior of 'git blame' in any way.
If a repo has a commit-graph file with computed changed-path Bloom filters, then they should notice improved performance for their 'git blame' commands.

Here are some example timings that I found by blaming some paths in the Linux kernel repository:

I specifically looked for "deep" paths that were also edited many times.
As a counterpoint, the MAINTAINERS file was edited many times but is located in the root tree.
This means that the cost of computing a diff relative to the pathspec is very small. Here are the timings for that command:

These timings are the best of five.
The worst-case runs were on the order of 2.5 minutes for both cases.
Note that the MAINTAINERS file has 18,740 lines across 17,000+ commits. This happens to be one of the cases where this change provides the least improvement.

The lack of improvement for the MAINTAINERS file and the relatively modest improvement for the other examples can be easily explained.
The blame machinery needs to compute line-level diffs to determine which lines were changed by each commit. That makes up a large proportion of the computation time, and this change does not attempt to improve on that section of the algorithm.
The MAINTAINERS file is large and changed often, so it takes time to determine which lines were updated by which commit. In contrast, the code files are much smaller, and it takes longer to compute the line-by-line diff for a single patch on the Linux mailing lists.

Outside of the "-C" integration, I believe there is little more to gain from the changed-path Bloom filters for 'git blame' after this patch.



不过,请务必使用 Git 2.29(2020 年第四季度),因为存在一个小错误:
commit 1302bad (2020 年 9 月 8 日)来自 Edmundo Carmona Antoranz ( eantoranz ) .
(由 Junio C Hamano -- gitster -- merge 于 commit e1dd499 ,2020 年 9 月 18 日)

blame.c: replace instance of !oidcmp for oideq

Signed-off-by: Edmundo Carmona Antoranz


0906ac2b ("blame: use changed-path Bloom filters", 2020-04-16, Git v2.27.0-rc0 -- merge listed in batch #6) introduced a call to oidcmp() that should have been oideq(), which was introduced in 14438c44 ("introduce hasheq() and oideq()", 2018-08-28, Git v2.20.0-rc0 -- merge listed in batch #1).



在 Git 2.29(2020 年第四季度)中,“ git commit-graph ( man)写”学会了限制从头开始计算的布隆过滤器的数量 --max-new-filters选项。
这将受益 git blame .
commit d356d5d , commit 98bb796 , commit 59f0d50 , commit 97ffa4f (2020 年 9 月 17 日), commit 809e032 (2020 年 9 月 18 日), commit 9a7a9ed , commit 312cff5 (2020 年 9 月 16 日)和 commit b66d847 , commit 24f951a , commit ab14d06 , commit 025d529 , commit 4f36440 (2020 年 9 月 9 日)来自 Taylor Blau ( ttaylorr ) .
commit b16a827 (2020 年 9 月 16 日) 来自 Derrick Stolee ( derrickstolee ) .
(由 Junio C Hamano -- gitster -- merge 于 commit 288ed98 ,2020 年 9 月 29 日)

builtin/commit-graph.c: introduce '--max-new-filters='

Helped-by: Junio C Hamano
Signed-off-by: Taylor Blau


Introduce a command-line flag to specify the maximum number of new Bloom filters that a 'git commit-graph write'(man) is willing to compute from scratch.

Prior to this patch, a commit-graph write with '--changed-paths' would compute Bloom filters for all selected commits which haven't already been computed (i.e., by a previous commit-graph write with '--split' such that a roll-up or replacement is performed).

This behavior can cause prohibitively-long commit-graph writes for a variety of reasons:

  • There may be lots of filters whose diffs take a long time to generate (for example, they have close to the maximum number of changes, diffing itself takes a long time, etc).
  • Old-style commit-graphs (which encode filters with too many entries as not having been computed at all) cause us to waste time recomputing filters that appear to have not been computed only to discover that they are too-large.

This can make the upper-bound of the time it takes for 'git commit-graph write --changed-paths'(man) to be rather unpredictable.

To make this command behave more predictably, introduce '--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters from scratch.
This lets "computing" already-known filters proceed quickly, while bounding the number of slow tasks that Git is willing to do.

git commit-graph现在包含在其 man page 中:

With the --max-new-filters=<n> option, generate at most n new Bloomfilters (if --changed-paths is specified).
If n is -1, no limit is enforced.
Only commits present in the new layer count against this limit.
To retroactively compute Bloom filters over earlier layers, it is advised to use --split=replace.



使用 Git 2.31(2021 年第一季度),优化“ git blame ”( man)
commit 8e16eff (2021 年 2 月 17 日) 来自 Rafael Silva ( raffs ) .
(由 Junio C Hamano -- gitster -- merge 于 commit 18decfd ,2021 年 2 月 25 日)

blame: remove unnecessary use of get_commit_info()

Signed-off-by: Rafael Silva
Reviewed-by: Taylor Blau


When git blame(man) --color-by-age, the determine_line_heat() is called to select how to color the output based on the commit's author date.
It uses the get_commit_info() to parse the information into a commit_info structure, however, this is actually unnecessary because the determine_line_heat() caller also does the same.

Instead, let's change the determine_line_heat() to take a commit_info structure and remove the internal call to get_commit_info() thus cleaning up and optimizing the code path.

Enabling Git's trace2 API in order to record the execution time for every call to determine_line_heat() function:

+ trace2_region_enter("blame", "determine_line_heat", the_repository);
determine_line_heat(ent, &default_color);
+ trace2_region_enter("blame", "determine_line_heat", the_repository);

Then, running git blame for "kernel/fork.c" in linux.git and summing all the execution time for every call (around 1.3k calls) resulted in 2.6x faster execution (best out 3):

git built from 328c109303 (The eighth batch, 2021-02-12) = 42ms
git built from 328c109303 + this change = 16ms

关于git - 加速 `git blame` 在有很多提交的存储库上,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57837986/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com