python - 为什么 file_object.tell() 会为不同位置的文件提供相同的字节？-6ren

python - 为什么 file_object.tell() 会为不同位置的文件提供相同的字节？

转载作者：太空宇宙更新时间：2023-11-04 10:00:08

24

4

刚开始使用 python，我无法绕过基本的文件导航方法。

当我阅读 tell() tutorial 它指出它返回我当前在我的文件上的位置(按字节)。

我的推理是文件的每个字符加起来就是字节坐标，对吧？这意味着在换行之后，这只是在 \n 字符上拆分的字符串，我的字节坐标会改变......但这似乎是不正确的。

我在 bash 上生成一个快速玩具文本文件

$ for i in {1..10}; do echo "@ this is the "$i"th line" ; done > toy.txt
$ for i in {11..20}; do echo " this is the "$i"th line" ; done >> toy.txt

现在我将遍历此文件并打印出行号，并在每个循环中打印出 tell() 调用的结果。 @ 用于标记一些分隔文件 block 的行，我想返回这些行(见下文)。

我的猜测是 for 循环 first 遍历文件对象，直到结束，因此它始终保持不变。

这是玩具示例，在我的实际问题中，文件的长度为 Gigs，通过应用相同的方法，我得到了 tell() 的结果，在我的图像 block 中反射(reflect)了 for循环遍历文件对象。它是否正确？您能否阐明我遗漏的概念？

我的最终目标是能够找到文件中的特定坐标，然后从分布式起点并行处理这些巨大的文件，而我无法以筛选它们的方式对其进行监控。

os.path.getsize("toy.txt")
451

fa = open("toy.txt")
fa.seek(0) # let's double check
fa.tell()
count = 0
for line in fa:
    if line.startswith("@"):
        print line ,
        print "tell {} count {}".format(fa.tell(), count)
    else:
        if count < 32775:
            print line,
            print "tell {} count {}".format(fa.tell(), count)
    count += 1

输出:

@ this is the 1th line
tell 451 count 0
@ this is the 2th line
tell 451 count 1
@ this is the 3th line
tell 451 count 2
@ this is the 4th line
tell 451 count 3
@ this is the 5th line
tell 451 count 4
@ this is the 6th line
tell 451 count 5
@ this is the 7th line
tell 451 count 6
@ this is the 8th line
tell 451 count 7
@ this is the 9th line
tell 451 count 8
@ this is the 10th line
tell 451 count 9
this is the 11th line
tell 451 count 10
this is the 12th line
tell 451 count 11
this is the 13th line
tell 451 count 12
this is the 14th line
tell 451 count 13
this is the 15th line
tell 451 count 14
this is the 16th line
tell 451 count 15
this is the 17th line
tell 451 count 16
this is the 18th line
tell 451 count 17
this is the 19th line
tell 451 count 18
this is the 20th line
tell 451 count 19

最佳答案

您正在使用 for 循环逐行读取文件:

for line in fa:

文件通常不会这样做；您读取数据 block ，通常是 block 。为了让 Python 给你换行，你需要一直读到下一个换行符。只是，逐字节读取以查找换行符效率不高。

因此使用了一个缓冲区；您阅读了一大块，然后在该 block 中找到换行符并为找到的每个换行符生成一行。缓冲区用完后，您将读取一个新 block 。

您的文件不够大，无法读取多个 block ；它只有 451 字节小，而缓冲区通常以千字节为单位。如果您要创建一个更大的文件，您会在迭代时看到文件位置大步跳跃。

参见 file.next documenation (next 是迭代时负责产生下一行的方法，for 循环的作用):

In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer.

如果您需要在遍历行时跟踪绝对文件位置，则必须在 Windows 上使用二进制模式(以防止发生换行符转换)，并跟踪行的长度自己:

position = 0    
for line in fa:
    position += len(line)

另一种方法是使用 io library ;这是 Python 3 中用于处理文件的框架。 file.tell() 方法将缓冲区考虑在内并会生成准确的文件位置即使在迭代时也是如此。

考虑到当你使用 io.open()以文本模式打开文件，您将获得unicode 字符串。在 Python 2 中，如果你必须有 str 字节串，你可以只使用二进制模式(用 'rb' 打开)。事实上，只有在二进制模式下，您才能访问 IOBase.tell()，在文本模式下会抛出异常:

>>> import io
>>> fa = io.open("toy.txt")
>>> next(fa)
u'@ this is the 1th line\n'
>>> fa.tell()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: telling position disabled by next() call

在二进制模式下，您可以获得 file.tell() 的准确输出:

>>> import os.path
>>> os.path.getsize("toy.txt")
461
>>> fa = io.open("toy.txt", 'rb')
>>> for line in fa:
...     if line.startswith("@"):
...         print line ,
...         print "tell {} count {}".format(fa.tell(), count)
...     else:
...         if count < 32775:
...             print line,
...             print "tell {} count {}".format(fa.tell(), count)
...     count += 1
...
@ this is the 1th line
tell 23 count 0
@ this is the 2th line
tell 46 count 1
@ this is the 3th line
tell 69 count 2
@ this is the 4th line
tell 92 count 3
@ this is the 5th line
tell 115 count 4
@ this is the 6th line
tell 138 count 5
@ this is the 7th line
tell 161 count 6
@ this is the 8th line
tell 184 count 7
@ this is the 9th line
tell 207 count 8
@ this is the 10th line
tell 231 count 9
 this is the 11th line
tell 254 count 10
 this is the 12th line
tell 277 count 11
 this is the 13th line
tell 300 count 12
 this is the 14th line
tell 323 count 13
 this is the 15th line
tell 346 count 14
 this is the 16th line
tell 369 count 15
 this is the 17th line
tell 392 count 16
 this is the 18th line
tell 415 count 17
 this is the 19th line
tell 438 count 18
 this is the 20th line
tell 461 count 19

关于python - 为什么 file_object.tell() 会为不同位置的文件提供相同的字节？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43987187/

24

4

0

文章推荐： python - 根据条件的最后一次出现划分数据帧

文章推荐： linux - 逐 block 处理日志，每个 block 包含多行

文章推荐： Java字计数器

c++ - if 中的多个语句是否与多个 if 相同？
是 if(a == 0 && b == 0 && c == 0) { return; } 一样 if(a == 0) { return; } if(b == 0) { return; } if(c =
Java继承-如何设置子类实例的值以供其他子类共享(相同)？
我想做这样的事情: Class A Class B extends A Class C extends A B b = new B(); C c = new C(); b->setField("foo
Mysql查询(相同)不同的结果集返回不同的结果顺序
我对 Mysql 世界很天真......:)我试图使用连接从表中查询，我遇到结果集问题...表结构如下下面... VIDEO_XXXXX | Field | Type
java - 相同(？)对象的不同对象引用
我最近问过关于从另一个类获取类的唯一实例的问题。 ( How to get specific instance of class from another class in Java? ) 所以，我正
c++ - 比较两种类型是否为 "literally"相同
假设我们有两种类型 using t1 = int*; using t2 = int*; 我知道 std::is_same::value会给我们true .什么是，或者是否有模板工具可以实现以下目标？
PHP - 为什么比较两个完整的长(相同)字符串比比较每个字符串的第一个字符要快得多？
对于我的一个应用程序，我假设比较 2 个字符串的第一个字符比比较整个字符串是否相等要快。例如，如果我知道只有 2 个可能的字符串(在一组 n 字符串中)可以以相同的字母开头(比如说 'q')，如果是这
c - 相同(重复)代码的时钟周期值不同
我想在我的NXP LPC11U37H主板（ARM Cortex-M0）上分析一些算法，因为我想知道执行特定算法需要多少个时钟周期。我编写了这些简单的宏来进行一些分析： #define START_C
excel - 如何在所有工作表中保持页眉(不是静态页眉)相同？
我在 Excel 中创建了一个宏，它将在 Excel 中复制一个表格，并将行除以我确定的特定数字(默认 = 500 行)，并为宏创建的每个部门打开不同的工作表。使用的代码是这样的: Sub Copy
python - 如果一个字典的值与第二个字典的键和第二个字典值 Python 相同
我想根据第一个字典对第二个字典的值求和。如果我有字典 A 和 B。 A = {"Mark": ["a", "b", "c", "d"], "June": ["e", "a"], "John": ["a
perl - system() 返回的值是否与 "$?"相同？
当我这样做时 system()在 Perl 中调用，我通常根据 perldocs 检查返回码.嗯，我是这么想的。大部分时间 $rc!=0对我来说已经足够了。最近我在这里帮助了两个遇到问题的人syste
javascript - 进度条加载速度与 div 相同
在我的进度条上，我试图让它检测 div 加载速度。如果 div 加载速度很快，我想要实现的目标将很快达到 100%。但进度条的加载速度应该与 div 的加载速度一样快。问题:如何让我的进度条加载
Firebase 服务器时间戳与本地(几乎)相同
当我获得与本地时间相同的时间戳时，firebase 生成的服务器时间戳是否会自动转换为本地时间，或者我错过了什么？ _firestore.collection("9213903123").docume
semantics - OWL 双关语是否将同名的类和个体在语义上视为“相同”？
根据the original OWL definition of OWL DL ，我们不能为类和个体赋予相同的名称(这是 OWL DL 和 OWL Full 之间的明显区别)。 "Punning" i
javascript - 允许两个输入复选框的行为与 jquery 相同
我有两个输入复选框: 尝试使用 jQuery 来允许两个输入的行为相同。如果选中第一个复选框，则选中第二个复选框。如果未检查第 1 个，则不会检查第 2 个。反之亦然。我有代码: $('inpu
java - 相同 Java 文件的编译
可以从不同系统编译两个相同的java文件，但它们都有相同的内容操作系统(Windows 7)，会生成不同的.class文件(大小)？最佳答案是的，您可以检查是否有不同版本的JDK(Java Dev
regex - 正则表达式 - .*$ 与 .* 相同
我正在清理另一个人的正则表达式，他们目前所有的都以结尾 .*$ 那么下面的不是完全一样吗？ .* 最佳答案 .*将尽可能匹配，但默认情况下为 .不匹配换行符。如果您要匹配的文本有换行符并且您处于 MU
TypeScript:与Pick <...>相同，但具有多个字段
我使用 Pick ，但是如何编写可以选择多个字段的通用PickMulti呢？ interface MyInterface { a: number, b: number, c: number
sql - 相同 SQL 查询在一个数据库中运行的时间比在同一服务器下的另一个数据库中运行的时间长
我有一个 SQL 数据库服务器和 2 个具有相同结构和数据的数据库。我在 2 个数据库中运行相同的 sql 查询，其中一个需要更长的时间，而另一个在不到 50% 的时间内完成。他们都有不同的执行计划。
php - 使列与 id 相同
我需要你的帮助，我有一个包含两列的表，一个 id 和 numpos，我希望 id 和 numops 具有相同的结果。例子: $cnx = mysql_connect( "localhost", "r
PHP - 表 ID 相同
如何将相同的列(在本例中按“级别”排序)放在一起？我正在做一个高分，我从我的数据库中按级别列出它们。如果他们处于同一级别，我希望他们具有相同的 ID。但是我不想在别人身上显示ID。只有第一个。这是一

首页

博学

6Ren·AI

商城

python - 为什么 file_object.tell() 会为不同位置的文件提供相同的字节？