python - 基准测试 : does python have a faster way of walking a network folder?-6ren

python - 基准测试 : does python have a faster way of walking a network folder?

转载作者：IT老高更新时间：2023-10-28 22:05:46

我需要浏览一个包含大约一万个文件的文件夹。我的旧 vbscript 处理这个速度很慢。从那以后我开始使用 Ruby 和 Python，我在这三种脚本语言之间做了一个基准测试，看看哪种语言最适合这项工作。

以下对共享网络上 4500 个文件子集的测试结果是

Python: 106 seconds
Ruby: 5 seconds
Vbscript: 124 seconds

Vbscript 最慢并不奇怪，但我无法解释 Ruby 和 Python 之间的区别。我对 Python 的测试不是最优的吗？有没有更快的方法在 Python 中做到这一点？

thumbs.db 的测试只是为了测试，实际上还有更多测试要做。

我需要一些东西来检查路径上的每个文件，并且不会产生太多输出以免干扰时间。每次运行的结果都有些不同，但差别不大。

#python2.7.0
import os

def recurse(path):
  for (path, dirs, files) in os.walk(path):
    for file in files:
      if file.lower() == "thumbs.db":
        print (path+'/'+file)

if __name__ == '__main__':
  import timeit
  path = '//server/share/folder/'
  print(timeit.timeit('recurse("'+path+'")', setup="from __main__ import recurse", number=1))

'vbscript5.7
set oFso = CreateObject("Scripting.FileSystemObject")
const path = "\\server\share\folder"
start = Timer
myLCfilename="thumbs.db"

sub recurse(folder)
  for each file in folder.Files
    if lCase(file.name) = myLCfilename then
      wscript.echo file
    end if
  next
  for each subfolder in folder.SubFolders
    call Recurse(subfolder)
  next
end Sub

set folder = oFso.getFolder(path)
recurse(folder)
wscript.echo Timer-start

#ruby1.9.3
require 'benchmark'

def recursive(path, bench)
  bench.report(path) do
    Dir["#{path}/**/**"].each{|file| puts file if File.basename(file).downcase == "thumbs.db"}
  end
end

path = '//server/share/folder/'
Benchmark.bm {|bench| recursive(path, bench)}

编辑:因为我怀疑打印导致延迟，所以我测试了打印所有 4500 个文件并且不打印的脚本，差异仍然存在，第一种情况是 R:5 P:107，而在第一种情况下是 R:4.5 P:107后者

EDIT2:根据此处的答案和评论，一个 Python 版本在某些情况下可以通过跳过文件夹运行得更快

import os

def recurse(path):
  for (path, dirs, files) in os.walk(path):
    for file in files:
      if file.lower() == "thumbs.db":
        print (path+'/'+file)

def recurse2(path):
    for (path, dirs, files) in os.walk(path):
        for dir in dirs:
            if dir in ('comics'):
                dirs.remove(dir)
        for file in files:
            if file.lower() == "thumbs.db":
                print (path+'/'+file)


if __name__ == '__main__':
  import timeit
  path = 'f:/'
  print(timeit.timeit('recurse("'+path+'")', setup="from __main__ import recurse", number=1)) 
#6.20102692
  print(timeit.timeit('recurse2("'+path+'")', setup="from __main__ import recurse2", number=1)) 
#2.73848228
#ruby 5.7

最佳答案

Dir 的 Ruby 实现在 C 中(文件 dir.c，根据 this documentation )。但是，实现了 Python 等效项 in Python .

Python 的性能不如 C，这并不奇怪，但 Python 中使用的方法提供了更多的灵 active - 例如，您可以跳过名为 e.g. 的整个子树。 '.svn', '.git', '.hg' 同时遍历目录层次结构。

大多数时候，Python 实现已经足够快了。

更新:文件/子目录的跳过根本不会影响遍历率，但是处理目录树所花费的总时间肯定会减少，因为您避免必须遍历主树的潜在大子树。节省的时间当然与您跳过的时间成正比。在您的情况下，它看起来像图像文件夹，您不太可能节省很多时间(除非图像受修订控制，否则跳过修订控制系统拥有的子树可能会产生一些影响)。

其他更新:通过更改 dirs 值来完成跳过文件夹:

for root, dirs, files in os.walk(path):
    for skip in ('.hg', '.git', '.svn', '.bzr'):
        if skip in dirs:
            dirs.remove(skip)
        # Now process other stuff at this level, i.e.
        # in directory "root". The skipped folders
        # won't be recursed into.

关于python - 基准测试 : does python have a faster way of walking a network folder?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13138160/

文章推荐： python - 如何配置 Pylint 以检查 PEP8 检查的所有内容？

文章推荐： android - 未收到 ACTION_MY_PACKAGE_REPLACED

文章推荐： c++ - 禁用窗口大小调整 Win32

文章推荐： eclipse - 如何忽略 Eclipse 中的 Node shebang 错误？

IT老高

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 基准测试 : does python have a faster way of walking a network folder?