gpt4 book ai didi

python - 用于在 python 中预加载文件的线程缓冲区迭代器

转载 作者:行者123 更新时间:2023-12-01 04:53:48 25 4
gpt4 key购买 nike

我有数百万个小文件,我想创建一个 FileLoader 类,它使用后台线程将它们预加载到内存中的文件池中,以便加快事情的进展。

我当前的解决方案使用线程缓冲区:

from itertools import islice, chain

class FileLoader(list):
def __init__(self,file_list):
# a list of file paths
self.fl = file_list

def Next(self,size=None): # get Next size=N file
if size: # batch mode
current_batch = []
for f in self.fl:
current_batch.append(open(f).read())
if len(current_batch) == size:
yield current_batch
current_batch = []
if current_batch:
yield current_batch

else: # sequence mode
for f in self.fl:
yield open(f).read()

if __name__ == '__main__':
fl = FileLoader(file_list)
for fs in fl.Next(5): # the files should be pooled in memory in advance
# ... my work....

最佳答案

import multiprocessing

def get_contents(filename):
with open(filename) as f:
return f.read()

pool = multiprocessing.Pool(processes=2) # or more
for fs in pool.imap(get_contents, file_list, 5) # 5 is the chunk size here
# ... your work ...

如果您不关心顺序,使用 imap_unordered 可能会更快。试验 block 大小和进程数。与您的草稿不同,此方法一次生成一个内容,但批处理可以围绕它进行。

关于python - 用于在 python 中预加载文件的线程缓冲区迭代器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27895392/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com