gpt4 book ai didi

python - os.path.exists() 谎言

转载 作者:太空狗 更新时间:2023-10-29 18:27:01 28 4
gpt4 key购买 nike

我在 linux 集群上运行了一些 python 脚本,一个作业的输出通常是另一个脚本的输入,可能在另一个节点上运行。我发现在 python 注意到已在其他节点上创建的文件之前存在一些并非微不足道的延迟——os.path.exists() 返回 false 并且 open() 也失败。在文件出现之前,我可以做一段时间而不是 os.path.exists(mypath) 循环,这可能需要整整一分钟,这在具有许多步骤并可能并行运行许多数据集的管道中不是最佳选择。

到目前为止,我发现的唯一解决方法是调用 subprocess.Popen("ls %s"%(pathdir), shell=True),这神奇地解决了问题。我认为这可能是系统问题,但是 python 可能会导致这种情况吗?某种缓存之类的?到目前为止,我的系统管理员没有提供太多帮助。

最佳答案

os.path.exists() 只是调用 C 库的 stat() 函数。

我相信您遇到了内核 NFS 实现中的缓存。下面是一个页面链接,该页面描述了问题以及刷新缓存的一些方法。

File Handle Caching

Directories cache file names to file handles mapping. The most common problems with this are:

•You have an opened file, and you need to check if the file has been replaced by a newer file. You have to flush the parent directory's file handle cache before stat() returns the new file's information and not the opened file's.

◦Actually this case has another problem: The old file may have been deleted and replaced by a new file, but both of the files may have the same inode. You can check this case by flushing the open file's attribute cache and then seeing if fstat() fails with ESTALE.

•You need to check if a file exists. For example a lock file. Kernel may have cached that the file does not exist, even if in reality it does. You have to flush the parent directory's negative file handle cache to to see if the file really exists.

A few ways to flush the file handle cache:

•If the parent directory's mtime changed, the file handle cache gets flushed by flushing its attribute cache. This should work quite well if the NFS server supports nanosecond or microsecond resolution.

•Linux: chown() the directory to its current owner. The file handle cache is flushed if the call returns successfully.

•Solaris 9, 10: The only way is to try to rmdir() the parent directory. ENOTEMPTY means the cache is flushed. Trying to rmdir() the current directory fails with EINVAL and doesn't flush the cache.

•FreeBSD 6.2: The only way is to try to rmdir() either the parent directory or the file under it. ENOTEMPTY, ENOTDIR and EACCES failures mean the cache is flushed, but ENOENT did not flush it. FreeBSD does not cache negative entries, so they do not have to be flushed.

http://web.archive.org/web/20100912144722/http://www.unixcoding.org/NFSCoding

关于python - os.path.exists() 谎言,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3112546/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com