gpt4 book ai didi

python - 将文件夹中的文件列为流以立即开始处理

转载 作者:太空狗 更新时间:2023-10-29 20:40:09 26 4
gpt4 key购买 nike

我得到一个包含 100 万个文件的文件夹。

当列出此文件夹中的文件时,我想立即开始处理,使用 Python 或其他脚本语言。

常用函数(python 中的 os.listdir...)正在阻塞,我的程序必须等待列表的末尾,这可能需要很长时间。

列出大文件夹的最佳方式是什么?

最佳答案

如果方便的话,改变你的目录结构;但如果没有,你可以use ctypes to call opendir and readdir .

这是该代码的副本;我所做的只是正确地缩进它,添加 try/finally block ,并修复一个错误。您可能必须调试它。特别是结构布局。

请注意,此代码不可 可移植。您需要在 Windows 上使用不同的函数,我认为结构因 Unix 而异。

#!/usr/bin/python
"""
An equivalent os.listdir but as a generator using ctypes
"""

from ctypes import CDLL, c_char_p, c_int, c_long, c_ushort, c_byte, c_char, Structure, POINTER
from ctypes.util import find_library

class c_dir(Structure):
"""Opaque type for directory entries, corresponds to struct DIR"""
pass
c_dir_p = POINTER(c_dir)

class c_dirent(Structure):
"""Directory entry"""
# FIXME not sure these are the exactly correct types!
_fields_ = (
('d_ino', c_long), # inode number
('d_off', c_long), # offset to the next dirent
('d_reclen', c_ushort), # length of this record
('d_type', c_byte), # type of file; not supported by all file system types
('d_name', c_char * 4096) # filename
)
c_dirent_p = POINTER(c_dirent)

c_lib = CDLL(find_library("c"))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p

# FIXME Should probably use readdir_r here
readdir = c_lib.readdir
readdir.argtypes = [c_dir_p]
readdir.restype = c_dirent_p

closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int

def listdir(path):
"""
A generator to return the names of files in the directory passed in
"""
dir_p = opendir(path)
try:
while True:
p = readdir(dir_p)
if not p:
break
name = p.contents.d_name
if name not in (".", ".."):
yield name
finally:
closedir(dir_p)

if __name__ == "__main__":
for name in listdir("."):
print name

关于python - 将文件夹中的文件列为流以立即开始处理,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4403598/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com