gpt4 book ai didi

Git - print files mixed in different encoding(Git-混合使用不同编码的打印文件)

转载 作者:bug小助手 更新时间:2023-10-26 21:26:34 27 4
gpt4 key购买 nike



I am working on a project whose files are different in encoding.(My OS is centos 7)

我正在做一个项目,它的文件编码不同。(我的操作系统是Centos 7)


For example, $SRC/a.cpp may encoded in UTF-8, while $SRC/b.cpp is encoded in GB 2312(simplified Chinese).

例如,$src/a.cpp可能以UTF-8编码,而$src/b.cpp则以GB 2312(简体中文)编码。


Now if I enter git diff, the content will not display properly due to the mixed encoding.

现在,如果我输入git diff,由于混合编码,内容将不能正确显示。


I've tried iconv like this

我试过像这样的图标


git diff HEAD~1 | iconv -f gb2312 -t utf8 | less

It works well if all the files involved are encoded in GB 2312. But if any UTF-8 file is mixed, then iconv will broke like this

如果所有涉及的文件都是以GB 2312编码的,它就能很好地工作。但如果任何UTF-8文件混合在一起,那么icv将像这样被破坏


some well displayed UTF-8 text
...
iconv: illegal input sequence at position 120

My question is that if it is possible to make commands like git diff work properly without changing the file itself? I hope there can be some scripts filtering non-UTF-8 file for iconv or some git confiuration that can run iconv for non-UTF-8 file only.

我的问题是,是否有可能让像git diff这样的命令在不更改文件本身的情况下正常工作?我希望可以有一些脚本过滤非UTF-8文件的图标或一些git配置,可以运行的图标只为非UTF-8文件。


Edit: The client of this project requests some files to have specific encodings and wants as less changes as possible for stability, so modifying files' encoding directly is not possible. A workaround without modifying the project is prefer.

编辑:这个项目的客户端要求一些文件有特定的编码,并希望尽可能少的更改以保持稳定性,因此不可能直接修改文件的编码。最好是不修改项目的解决方法。


更多回答

Wouldn't it be better to use UTF-8 encoding for all files?

对所有文件使用UTF-8编码不是更好吗?

@Bodo Yes, changing file encoding is the most straightforward way in my position. But sadly, this project's client requests other files like python script and xml file to be encoded in a given encoding, which is not utf8. So I can not avoid mixing files with different encodings.

@Bodo是的,在我的位置上更改文件编码是最简单的方法。但不幸的是,这个项目的客户端要求其他文件,如python脚本和xml文件,以给定的编码,这是不是utf8编码。所以我不能避免混合不同编码的文件。

Do they really need the files to have a specific encoding in Git? You could have the files encoded as UTF-8 in Git and change the encoding when building / preparing the software. All additional information should be part of the question. You can edit the question.

他们真的需要用Git对文件进行特定的编码吗?您可以在Git中将文件编码为UTF-8,并在构建/准备软件时更改编码。所有其他信息都应该是问题的一部分。您可以编辑问题。

@Bodo Your suggestion provide another perspective. The client does request some files to have a specific encoding. Actually it is a to b bussiness so wired request from the client is quite common. Same reason, changes encoding is hard to push, because it means modifying the whole project. Besides, some git GUI tools can handle different encodings well, so it is hard to convince the client. And that's the reason why I am searching for a workaround to make git works properly in cli without changing file's encoding. I will add these details to the question, thank you for reminding.

@Bodo你的建议提供了另一种视角。客户端确实会请求某些文件使用特定的编码。实际上,这是一项点对点的业务,所以来自客户端的有线请求是很常见的。同样的原因,更改编码也很难推送,因为这意味着修改整个项目。此外,一些Git图形用户界面工具可以很好地处理不同的编码,所以很难说服客户。这就是为什么我正在寻找一种解决办法,使git在不更改文件编码的情况下在cli中正常工作。我会将这些细节添加到问题中,谢谢您的提醒。

优秀答案推荐

You might need a git config diff driver

您可能需要一个Git配置比较驱动程序


That driver script would first identify the encoding of each file and then convert it to UTF-8 if necessary before showing the diff.

该驱动程序脚本将首先识别每个文件的编码,然后在显示差异之前将其转换为UTF-8(如果需要)。


Create a shell script (for instance git-diff-encoding.sh, with chmod +x git-diff-encoding.sh) which identifies the encoding of the files and then converts them to UTF-8 if necessary before showing the diff.

创建一个Shell脚本(例如git-diff-encoding.sh,使用chmod+x git-diff-encoding.sh)来标识文件的编码,然后根据需要在显示diff之前将其转换为UTF-8。


#!/bin/bash

FILE1="path/to/file1"
FILE2="path/to/file2"

# Identify encoding of the files using file command
ENC1=$(file -bi "$FILE1" | awk -F charset= '{print $2}')
ENC2=$(file -bi "$FILE2" | awk -F charset= '{print $2}')

# Convert files to UTF-8 if necessary
[ "$ENC1" != "utf-8" ] && iconv -f "$ENC1" -t utf-8 "$FILE1" -o "$FILE1".utf8
[ "$ENC2" != "utf-8" ] && iconv -f "$ENC2" -t utf-8 "$FILE2" -o "$FILE2".utf8

# Run git diff with potentially converted files
git diff --no-index "${FILE1}${ENC1:+.utf8}" "${FILE2}${ENC2:+.utf8}"

In your .git/config file, add the following lines to define a new diff driver called "encoding":

在您的.git/config文件中,添加以下行以定义一个名为“Coding”的新的diff驱动程序:


[diff "encoding"]
command = /path/to/your/git-diff-encoding.sh

Tell Git which files should be handled by this new diff driver. You can do this in your repository's .gitattributes file (create it, if it does not exist, at the root folder of your Git repository). Add lines specifying the files to be handled by your new diff driver, for example:

告诉Git这个新的diff驱动程序应该处理哪些文件。您可以在存储库的.gittributes文件中执行此操作(如果该文件不存在,则在Git存储库的根文件夹中创建它)。添加指定要由新的diff驱动程序处理的文件的行,例如:


*.cpp diff=encoding

Now, git will use your custom diff script when running git diff for files matching the patterns specified in the .gitattributes file.

现在,当运行git diff时,git将使用您的自定义diff脚本来匹配.gitattributes文件中指定的模式。


更多回答

Thank you for your detailed reply. I've tried it for the project and it seems like the bash script exists some corner cases that crash the diff. Anyway now I know what to do. I believe they can be fixed by myself and my problem can be solved then.

感谢您的详细回复。我已经在这个项目中尝试过了,bash脚本似乎存在一些使diff崩溃的角例。不管怎样,现在我知道该怎么做了。我相信我自己可以解决它们,我的问题也可以在那时得到解决。

@Lotus Great! Don't forget to post an answer when you will have solved it!

@莲花大侠!当你解决了这个问题时,别忘了发一封回信!

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com