gpt4 book ai didi

Python 二进制 EOF

转载 作者:太空狗 更新时间:2023-10-30 01:48:44 27 4
gpt4 key购买 nike

我想通读一个二进制文件。谷歌搜索“python binary eof”引导我here .

现在,问题:

  1. 为什么容器(SO 答案中的 x)不包含单个(当前)字节而是包含一大堆字节?我做错了什么?
  2. 如果应该是这样并且我没有做错任何事情,那么如何读取单个字节?我的意思是,有什么方法可以在使用 read(1) 方法读取文件时检测 EOF?

最佳答案

引用documentation :

file.read([size])

Read at most size bytes from the file (less if the read hits EOF before obtaining size bytes). If the size argument is negative or omitted, read all data until EOF is reached. The bytes are returned as a string object. An empty string is returned when EOF is encountered immediately. (For certain files, like ttys, it makes sense to continue reading after an EOF is hit.) Note that this method may call the underlying C function fread() more than once in an effort to acquire as close to size bytes as possible. Also note that when in non-blocking mode, less data than was requested may be returned, even if no size parameter was given.

这意味着(对于 regular file ):

  • f.read(1) 将返回一个包含 1 个字节或 0 个字节的字节对象,达到 EOF
  • f.read(2) 将返回一个包含 2 个字节的字节对象,如果在第一个字节后到达 EOF,则返回 1 个字节,如果立即遇到 EOF,则返回 0 个字节。
  • ...

如果您想一次读取一个字节的文件,则必须在循环中read(1) 并测试结果是否为“空”:

# From answer by @Daniel
with open(filename, 'rb') as f:
while True:
b = f.read(1)
if not b:
# eof
break
do_something(b)

如果您想一次按 50 个字节的“ block ”读取文件,则必须循环read(50):

with open(filename, 'rb') as f:
while True:
b = f.read(50)
if not b:
# eof
break
do_something(b) # <- be prepared to handle a last chunk of length < 50
# if the file length *is not* a multiple of 50

事实上,您甚至可以更快地中断一次迭代:

with open(filename, 'rb') as f:
while True:
b = f.read(50)
do_something(b) # <- be prepared to handle a last chunk of size 0
# if the file length *is* a multiple of 50
# (incl. 0 byte-length file!)
# and be prepared to handle a last chunk of length < 50
# if the file length *is not* a multiple of 50
if len(b) < 50:
break

关于你问题的另一部分:

Why does the container [..] contain [..] a whole bunch of them [bytes]?

引用that code :

for x in file:  
i=i+1
print(x)

再次引用the doc :

A file object is its own iterator, [..]. When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing).

上面的代码逐行读取一个二进制文件。即在每次出现 EOL 字符 (\n) 时停止。通常,这会导致不同长度的 block ,因为大多数二进制文件包含随机分布的字符。

我不鼓励您以这种方式阅读二进制文件。请选择一个基于 read(size) 的解决方案。

关于Python 二进制 EOF,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25465792/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com