为什么shutil.rmtree（）这么慢？

我去检查如何在Python中删除一个目录，并导致使用shutil.rmtree()。与我期望的rm --recursive相比，速度让我感到惊讶。有没有更快的替代品，短缺使用subprocess模块？为什么shutil.rmtree（）这么慢？

来源

2011-03-29 Tshepang

如何大/深是您的目录？你有几个包含许多文件或者很深层次的目录吗？ – 2011-03-29 10:24:08

@DavidCournapeau：这是一堆生成目录，所以它是一个非常深的层次。 – Tshepang 2011-03-29 10:26:55

The implementation做了很多额外的处理：

def rmtree(path, ignore_errors=False, onerror=None): 
    """Recursively delete a directory tree. 

    If ignore_errors is set, errors are ignored; otherwise, if onerror 
    is set, it is called to handle the error with arguments (func, 
    path, exc_info) where func is os.listdir, os.remove, or os.rmdir; 
    path is the argument to that function that caused it to fail; and 
    exc_info is a tuple returned by sys.exc_info(). If ignore_errors 
    is false and onerror is None, an exception is raised. 

    """ 
    if ignore_errors: 
     def onerror(*args): 
       pass 
    elif onerror is None: 
     def onerror(*args): 
       raise 
    try: 
     if os.path.islink(path): 
       # symlinks to directories are forbidden, see bug #1669 
       raise OSError("Cannot call rmtree on a symbolic link") 
    except OSError: 
     onerror(os.path.islink, path, sys.exc_info()) 
     # can't continue even if onerror hook returns 
     return 
    names = [] 
    try: 
     names = os.listdir(path) 
    except os.error, err: 
     onerror(os.listdir, path, sys.exc_info()) 
    for name in names: 
     fullname = os.path.join(path, name) 
     try: 
       mode = os.lstat(fullname).st_mode 
     except os.error: 
       mode = 0 
     if stat.S_ISDIR(mode): 
       rmtree(fullname, ignore_errors, onerror) 
     else: 
      try: 
       os.remove(fullname) 
      except os.error, err: 
       onerror(os.remove, fullname, sys.exc_info()) 
    try: 
     os.rmdir(path) 
    except os.error: 
     onerror(os.rmdir, path, sys.exc_info())

注意用于创建新的文件名的os.path.join();字符串操作确实需要时间。 rm(1)实现改为使用unlinkat(2)系统调用，该调用不执行任何其他字符串操作。（实际上，为了一遍又一遍地遍历整个namei()，内核的内核的dentry缓存很好用，但这仍然是相当数量的内核字符串操作和比较）。rm(1)实用程序可以绕过所有的字符串操作，只需使用该目录的文件描述符即可。

此外，rm(1)和rmtree()都检查树中每个文件和目录的st_mode;但C实现不需要将每个struct statbuf转换为一个Python对象来执行简单的整数掩码操作。我不知道这个过程需要多长时间，但是它发生在目录树中的每个文件，目录，管道，符号链接等。

来源

2011-03-29 10:44:11 sarnold

忘记字符串操作，它是无关紧要的。其他磁盘访问是速度差异。 – 2011-03-29 13:03:11

不一定 - 如果缓存很热（如果shutil.rmtree是在构建之后在构建树上完成的话，这可能会很有意义）。 – 2011-03-29 13:49:23

如果你关心速度：

使用os.system（ 'RM -fr “％S”' ％your_dirname）

除此之外，我没有找到shutil.rmtree（）慢得多..当然在Python级别上还会有额外的开销。除此之外，如果您提供合理的数字，我只相信这样的要求。

来源

2011-03-29 10:20:12

由于*使用子进程模块*，我的意思是没有像os.system（）这样的外部系统调用。 – Tshepang 2011-03-29 10:30:04

这取决于：调用os.system（）或子进程可能会更慢：如果经常调用它，操作系统需要创建大量进程，最后shutil中的python版本会更快。 – guettli 2012-08-21 15:12:31

对于一个大约有15,000个小文件（<10KB）的目录（没有别的），它需要几分钟的时间，没有任何进展。以其他方式删除它要快得多。 – 2016-02-29 17:57:23

虽然我不知道什么是错的，则可以尝试其他方法，例如删除所有文件，然后再试目录

for r,d,f in os.walk("path"): 
    for files in f: 
     os.remove (os.path.join(r,files)) 
    os.removedirs(r)

来源

2011-03-29 10:44:34 kurumi

然而，我翻译了一下，'os.removedirs（r）'删除了根目录，而不是右空目录？ – pebox11 2016-06-22 11:35:50

为什么shutil.rmtree（）这么慢？

回答

相关问题