gpt4 book ai didi

python - 逐行读取gzip大文件

转载 作者:太空宇宙 更新时间:2023-11-03 18:25:17 24 4
gpt4 key购买 nike

我需要知道一个数字在 2912232966 行的 gzip 文件中出现了多少次,我有以下内容:

import gzip
from itertools import islice

count=0
f = gzip.open(file,'rb')
for line in f:
lin = line.decode('utf-8')
number = lin[:lin.index('\t')]
if number == '2719708':
conunt+=1

但我明白了:'CRC 检查失败 0xabc8df68!= 0xba1760acL'

它只有效最多只能处理 400000000 行,请帮忙

最佳答案

链接到zlib

引用 jiffyclubs 的回答 here

The issue with the gzip module is not that it can't decompress the partial file, the error occurs only at the end when it tries to verify the checksum of the decompressed content. (The original checksum is stored at the end of the compressed file so the verification will never, ever work with a partial file.)

The key is to trick gzip into skipping the verification. The answer by caesar0301 does this by modifying the gzip source code, but it's not necessary to go that far, simple monkey patching will do. I wrote this context manager to temporarily replace gzip.GzipFile._read_eof while I decompress the partial file:

这看起来正是您所需要的......

转到该链接并阅读整个回复。

<小时/>

通过在 google 中搜索“python gzip crc check failed”的 stackexchange 链接找到的第一个结果

关于python - 逐行读取gzip大文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23326206/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com