gpt4 book ai didi

python - 如何使用 Python 对大文件进行排序?

转载 作者:太空宇宙 更新时间:2023-11-03 11:33:49 25 4
gpt4 key购买 nike

我在 activestate.com 上发现了一些很有前途的代码来对大文件进行排序。我试图在 Ubuntu 10.04 上的默认 Python 2.6.5 解释器上运行它。当我尝试在一个小的测试文件上运行它时,我得到了下面的错误跟踪。我在 activestate.com 上寻求帮助,但这个帖子已经沉寂了 18 个月多了。这里有没有人看到明显的解决方案?

谢谢。

## {{{ http://code.activestate.com/recipes/576755/ (r3)
# based on Recipe 466302: Sorting big files the Python 2.4 way
# by Nicolas Lehuen

import os
from tempfile import gettempdir
from itertools import islice, cycle
from collections import namedtuple
import heapq

Keyed = namedtuple("Keyed", ["key", "obj"])

def merge(key=None, *iterables):
# based on code posted by Scott David Daniels in c.l.p.
# http://groups.google.com/group/comp.lang.python/msg/484f01f1ea3c832d

if key is None:
keyed_iterables = iterables
else:
keyed_iterables = [(Keyed(key(obj), obj) for obj in iterable)
for iterable in iterables]

for element in heapq.merge(*keyed_iterables):
yield element.obj


def batch_sort(input, output, key=None, buffer_size=32000, tempdirs=None):
if tempdirs is None:
tempdirs = []
if not tempdirs:
tempdirs.append(gettempdir())

chunks = []
try:
with open(input,'rb',64*1024) as input_file:
input_iterator = iter(input_file)
for tempdir in cycle(tempdirs):
current_chunk = list(islice(input_iterator,buffer_size))
if not current_chunk:
break
current_chunk.sort(key=key)
output_chunk = open(os.path.join(tempdir,'%06i'%len(chunks)),'w+b',64*1024)
chunks.append(output_chunk)
output_chunk.writelines(current_chunk)
output_chunk.flush()
output_chunk.seek(0)
with open(output,'wb',64*1024) as output_file:
output_file.writelines(merge(key, *chunks))
finally:
for chunk in chunks:
try:
chunk.close()
os.remove(chunk.name)
except Exception:
pass

错误轨迹:

Traceback (most recent call last):
File "./batch_sort.py", line 108, in <module>
batch_sort(args[0],args[1],options.key,options.buffer_size,options.tempdirs)
File "./batch_sort.py", line 54, in batch_sort
output_file.writelines(merge(key, *chunks))
File "./batch_sort.py", line 30, in merge
yield element.obj
AttributeError: 'str' object has no attribute 'obj'

最佳答案

合并代码不正确。如果您不提供键,则每个元素都是一个字符串而不是键控元组。

试试这个:

def merge(key=None, *iterables):
# based on code posted by Scott David Daniels in c.l.p.
# http://groups.google.com/group/comp.lang.python/msg/484f01f1ea3c832d

if key is None:
for element in heapq.merge(*iterables):
yield element
else:
keyed_iterables = [(Keyed(key(obj), obj) for obj in iterable)
for iterable in iterables]
for element in heapq.merge(*keyed_iterables):
yield element.obj

关于python - 如何使用 Python 对大文件进行排序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10665925/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com