gpt4 book ai didi

python - 在 Python 中获取文本文件的换行统计信息

转载 作者:太空宇宙 更新时间:2023-11-04 08:56:30 25 4
gpt4 key购买 nike

我在 git 文件中有一个令人讨厌的 CRLF/LF 冲突,这可能是从 Windows 机器上提交的。是否有跨平台的方式(最好是在 Python 中)来检测哪种类型的换行符在文件中占主导地位?

我有这段代码(基于 https://stackoverflow.com/a/10562258/239247 的想法):

import sys
if not sys.argv[1:]:
sys.exit('usage: %s <filename>' % sys.argv[0])

with open(sys.argv[1],"rb") as f:
d = f.read()
crlf, lfcr = d.count('\r\n'), d.count('\n\r')
cr, lf = d.count('\r'), d.count('\n')
print('crlf: %s' % crlf)
print('lfcr: %s' % lfcr)
print('cr: %s' % cr)
print('lf: %s' % lf)
print('\ncr-crlf-lfcr: %s' % (cr - crlf - lfcr))
print('lf-crlf-lfcr: %s' % (lf - crlf - lfcr))
print('\ntotal (lf+cr-2*crlf-2*lfcr): %s\n' % (lf + cr - 2*crlf - 2*lfcr))

但它给出了错误的统计信息(对于 this file ):

crlf: 1123
lfcr: 58
cr: 1123
lf: 1123

cr-crlf-lfcr: -58
lf-crlf-lfcr: -58

total (lf+cr-2*crlf-2*lfcr): -116

最佳答案

import sys


def calculate_line_endings(path):
# order matters!
endings = [
b'\r\n',
b'\n\r',
b'\n',
b'\r',
]
counts = dict.fromkeys(endings, 0)

with open(path, 'rb') as fp:
for line in fp:
for x in endings:
if line.endswith(x):
counts[x] += 1
break
print(counts)


if __name__ == '__main__':
if len(sys.argv) == 2:
calculate_line_endings(sys.argv[1])

sys.exit('usage: %s <filepath>' % sys.argv[0])

为您的文件提供输出

crlf: 1123
lfcr: 0
cr: 0
lf: 0

够了吗?

关于python - 在 Python 中获取文本文件的换行统计信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29695861/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com