python - RobuSTLy 使用 Git blame 检索 SHA 和行内容 (Python3)-6ren

python - RobuSTLy 使用 Git blame 检索 SHA 和行内容 (Python3)

转载作者：行者123 更新时间：2023-12-04 04:17:06

我正在为一个使用 git blame 检索文件信息的包 (Python >= 3.5) 做贡献。我正在努力更换 GitPython自定义代码的依赖性仅支持我们实际需要的一小部分功能(并以我们实际需要的形式提供数据)。

我发现 git blame -lts 最接近我的需要，即检索文件中每一行的提交 SHA 和行内容。这给了我这样的输出

82a3e5021b7131e31fc5b110194a77ebee907955 books/main/docs/index.md  5) Softwareplattform [ILIAS](https://www.ilias.de/), die an zahlreichen

我已经处理过

       line_pattern = re.compile('(.*?)\s.*\s*\d\)(\s*.*)')

        for line in cmd.stdout():
            m = line_pattern.match(line)
            if m:
                sha = m.group(1)
                content = m.group(2).strip()

效果很好。然而，该软件包的维护者正确地警告说，“这可能会为非常特定的用户组引入难以调试的错误。可能需要跨多个操作系统和 GIT 版本进行大量单元测试。”

我采用我的方法是因为我发现 git blame --porcelain 的输出解析起来有些乏味。

30ed8daf1c48e4a7302de23b6ed262ab13122d31 1 1 1
author XY
author-mail <XY>
author-time 1580742131
author-tz +0100
committer XY
committer-mail <XY>
committer-time 1580742131
committer-tz +0100
summary Stub-Outline-Dateien
filename home/docs/README.md
        hero: abcdefghijklmnopqrstuvwxyz
82a3e5021b7131e31fc5b110194a77ebee907955 18 18

82a3e5021b7131e31fc5b110194a77ebee907955 19 19
        ---
82a3e5021b7131e31fc5b110194a77ebee907955 20 20

...

我不喜欢这种对字符串列表的迭代所涉及的内务处理。

我的问题是:

1) 我是否应该更好地使用 --porcelain 输出，因为它明确用于机器消费？2) 我可以期望这种格式在 Git 版本和操作系统上是健壮的吗？我是否可以假设以 TAB 字符开头的行是内容行，这是源代码行的最后输出行，并且该制表符之后的任何内容都是原始行内容？

最佳答案

不知道这是否是最好的解决方案，我没有在这里等待答案就试了一下。我假设我的两个问题的答案是"is"。

可以在此处的上下文中看到以下代码:https://github.com/uliska/mkdocs-git-authors-plugin/blob/6f5822c641452cea3edb82c2bbb9ed63bd254d2e/mkdocs_git_authors_plugin/repo.py#L466-L565

    def _process_git_blame(self):
        """
        Execute git blame and parse the results.

        This retrieves all data we need, also for the Commit object.
        Each line will be associated with a Commit object and counted
        to its author's "account".
        Whether empty lines are counted is determined by the
        count_empty_lines configuration option.

        git blame --porcelain will produce output like the following
        for each line in a file:

        When a commit is first seen in that file:
            30ed8daf1c48e4a7302de23b6ed262ab13122d31 1 2 1
            author John Doe
            author-mail <j.doe@example.com>
            author-time 1580742131
            author-tz +0100
            committer John Doe
            committer-mail <j.doe@example.com>
            committer-time 1580742131
            summary Fancy commit message title
            filename home/docs/README.md
                    line content (indicated by TAB. May be empty after that)

        When a commit has already been seen *in that file*:
            82a3e5021b7131e31fc5b110194a77ebee907955 4 5
                    line content

        In this case the metadata is not repeated, but it is guaranteed that
        a Commit object with that SHA has already been created so we don't
        need that information anymore.

        When a line has not been committed yet:
            0000000000000000000000000000000000000000 1 1 1
            author Not Committed Yet
            author-mail <not.committed.yet>
            author-time 1583342617
            author-tz +0100
            committer Not Committed Yet
            committer-mail <not.committed.yet>
            committer-time 1583342617
            committer-tz +0100
            summary Version of books/main/docs/index.md from books/main/docs/index.md
            previous 1f0c3455841488fe0f010e5f56226026b5c5d0b3 books/main/docs/index.md
            filename books/main/docs/index.md
                    uncommitted line content

        In this case exactly one Commit object with the special SHA and fake
        author will be created and counted.

        Args:
            ---
        Returns:
            --- (this method works through side effects)
        """

        re_sha = re.compile('^\w{40}')

        cmd = GitCommand('blame', ['--porcelain', str(self._path)])
        cmd.run()

        commit_data = {}
        for line in cmd.stdout():
            key = line.split(' ')[0]
            m = re_sha.match(key)
            if m:
                commit_data = {
                    'sha': key
                }
            elif key in [
                'author',
                'author-mail',
                'author-time',
                'author-tz',
                'summary'
            ]:
                commit_data[key] = line[len(key)+1:]
            elif line.startswith('\t'):
                # assign the line to a commit
                # and create the Commit object if necessary
                commit = self.repo().get_commit(
                    commit_data.get('sha'),
                    # The following values are guaranteed to be present
                    # when a commit is seen for the first time,
                    # so they can be used for creating a Commit object.
                    author_name=commit_data.get('author'),
                    author_email=commit_data.get('author-mail'),
                    author_time=commit_data.get('author-time'),
                    author_tz=commit_data.get('author-tz'),
                    summary=commit_data.get('summary')
                )
                if len(line) > 1 or self.repo().config('count_empty_lines'):
                    author = commit.author()
                    if author not in self._authors:
                        self._authors.append(author)
                    author.add_lines(self, commit)
                    self.add_total_lines()
                    self.repo().add_total_lines()

关于python - RobuSTLy 使用 Git blame 检索 SHA 和行内容 (Python3)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60523415/

文章推荐： mirth - 设置 Mirth 目标以将转换数据作为自定义 ACK 发送回

文章推荐： postgresql - 将主键添加到数据库中还没有主键的所有表

文章推荐： python - 如何使用 GPU 实现更快的 convolve2d

objective-c - 如何在 iOS 中计算 SHA-2(最好是 SHA 256 或 SHA 512)哈希？
安全服务 API 似乎不允许我直接计算哈希。有很多公共(public)领域和自由许可的版本可用，但如果可能的话，我宁愿使用系统库实现。可以通过 NSData 或普通指针访问数据。哈希的加密强度对我
hash - SHA 的 SHA 作为复合对象的签名
我有一堆大对象，以及它们的结构和它们的向量。有时检查复合对象的完整性很重要；为此，我正在使用对象的 Sha256“签名”。至少有两种方法可以定义复合对象的签名:通过计算组件串联的sha，或者通过计算
sha - 什么是 sha-256 填充？
研究了一个 cpu 矿工的源代码，我发现了这段代码: work->data[20] = 0x80000000; 好吧，我问了编码，他的回答是: “这些值是标准 SHA-2 填充的一部分” 谷歌搜索“s
encryption - SHA-1、SHA-2 是否获得专利？
您是否需要许可证才能将 SHA-1 或 SHA-2 用于商业目的？最佳答案它最初由 NSA 为安全 DSA 加密创建，然后被 NIST 采用以维护算法的所有方面以及 SHA(2 和 3)。这是一
sha - RIPEMD-160 与 SHA-256
谁能解释一下 SHA-256 和 RIPEMD-160，哪种算法通常更快，性能和空间比较是什么(如果有)？我所说的空间比较并不是指 160 位和 256 位，而是指冲突频率、生产环境中空间要求的差异。
sha256 - “SHA-2” 和 “SHA-256” 有什么区别
我对 SHA-2 和 SHA-256 之间的区别有点困惑，并且经常听到它们互换使用。我认为 SHA-2 是哈希算法的“家族”，而 SHA-256 是该家族中的特定算法。任何人都可以消除困惑。最佳答案
java - 从 SHA-1 更改为 SHA-512
我正在尝试从 SHA-1 更改为 SHA-512 以获得更好的安全性，但我不完全清楚如何进行更改。这是我使用 SHA-1 的方法: public static String sha1Convert(
c# - 什么时候应该使用 SHA-1，什么时候应该使用 SHA-2？
在我的 C# 应用程序中，我使用 RSA 对文件进行签名，然后再由上传者上传到我公司的数据库中，在这里我必须选择 SHA-1 或 SHA-2 来计算哈希值。与编程中的任何其他组件一样，我知道必须有一个
multithreading - SHA1 、 SHA-256 、 SHA-512 可以分解为跨多个内核/线程运行吗？
我正在研究 SHA1 、 SHA-256 、 SHA-512 在不同处理器上的速度(计算哈希的时间) 这些散列算法可以分解为跨多个核心/线程运行吗？最佳答案如果您想将计算单个哈希的执行并行化(无论
hash - 哈希算法 SHA-2 和 SHA-3 有什么区别？
关闭。这个问题是off-topic .它目前不接受答案。想改进这个问题？ Update the question所以它是on-topic对于堆栈溢出。 9年前关闭。 Improve this que
java - 计算 SHA-2 或 SHA-3 哈希
这个问题在这里已经有了答案: SHA2 password hashing in java (4 个答案) 关闭 9 年前。我需要计算文件的 SHA-2 或 SHA-3。我没有提交任何代码示例来说明
c++ - 如何在 openssl 中从 SHA 切换到 SHA-1
在我的应用中，之前开发者已经使用openssl version 1.0.1e [#include openssl/sha.h]并且已经使用了函数 unsigned char *SHA(const un
encryption - SSL 加密、SHA-1 和 SHA-2
我正在尝试实现 SHA-2加密而不是 SHA-1 . 为此，我知道这两种哈希算法之间的位数不同，这让我很困惑。如何实现这一目标以及我需要在哪些部分进行必要的更改？我可以使用来自 Java、Pyth
java - 可以从 SHA-1 切换到 SHA-256 吗？
我目前正在使用 SHA-1。我像下面的代码一样使用它，但我想将它更改为 SHA-256。 public String sha1Encrypt(String str) { if(str == nul
C# SHA-256 与 Java SHA-256。不同的结果？
我想将一些 Java 代码转换为 C#。 Java 代码: private static final byte[] SALT = "NJui8*&N823bVvy03^4N".getBytes()
c# - 如何在使用 SHA-2 而不是 SHA-1 之间切换？
我的程序使用 SHA-1 证书进行 SSL 连接。 SHA-2 证书现在已被一些网络服务(Gmail)广泛使用。这会导致在电子邮件通知设置期间阻止与 SMTP 服务器的传入连接。为了发送电子邮件，我
git - 从 git diff 的短 SHA 中查找长 SHA
我在提交中生成差异/更改，以便我可以将其上传到 ReviewBoard。我使用了“git show d9f7121e8ebd4d1f789dab9f8214ada2h480b9cf”。它给了我 di
Git 正在转向新的哈希算法 SHA-256 但为什么 git 社区选择了 SHA-256
我刚刚从这个 HN-post 了解到 git 正在转向新的散列算法(从 SHA-1 到 SHA-256 ) 我想知道是什么让 SHA-256 最适合 git 的用例。是否有任何/许多强有力的技术原因
https - 将 SSL 证书 SHA-2 降级为 SHA-1
是的，我需要降级到 SHA-1 以增加对项目中旧浏览器的兼容性。有没有办法做到这一点？我正在使用 Linux Centos 6.5 和 Apache/2.2.15。我有 3 个文件: SSLCe
ssl - Cipher(rsa-with-aes-128-cbc-sha) 将使用哪个 SHA？
在 TLS1.1 和 TLS1.2 中，Cipher(rsa-with-aes-128-cbc-sha) 将使用哪个 Digest(SHA1 或 SHA256)？最佳答案根据官方openssl w

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - RobuSTLy 使用 Git blame 检索 SHA 和行内容 (Python3)