2012-11-21 102 views
1

我想将Linux系统上每个子目录中的文件数量汇总到Excel表格中。Python将每个子目录中的文件数量输出到csv文件

该目录一般设置为:maindir/person/task/somedata/files。 但是,设置的子目录有所不同(即,某些文件可能没有'task'目录),所以我需要让python遍历文件路径。

我的问题是我需要从'person'的所有子目录名称,目前我的代码(下面)只附加最近的目录和文件数量。如果任何人都可以帮助我解决这个问题,将不胜感激!

import os, sys, csv 

outwriter = csv.writer(open("Subject_Task_Count.csv", 'w')) 

dir_count=[] 
os.chdir('./../../') 
rootDir = "./" # set the directory you want to start from 
for root, dirs, files in os.walk(rootDir): 
for d in dirs: 
    a = str(d) 
    count = 0 
    for f in files: 
     count+=1 
    y= (a,count) 
    dir_count.append(y) 

for i in dir_count: 
    outwriter.writerow(i) 

回答

0

我不清楚你的问题,你可能想重新阅读os.walk文档。 root是正在遍历的当前目录。 dirs是立即在root的子目录,而files是直接在root中的文件。由于您的代码现在可以计算相同的文件(来自根目录)并将其记录为每个子目录中的文件数量。

这就是我想出来的。希望它接近你想要的。如果没有,则修改:)它会打印一个目录,目录中的文件数量以及目录及其所有子目录中的文件数量。

import os 
import csv 

# Open the csv and write headers. 
with open("Subject_Task_Count.csv",'wb') as out: 
    outwriter = csv.writer(out) 
    outwriter.writerow(['Directory','FilesInDir','FilesIncludingSubdirs']) 

    # Track total number of files in each subdirectory by absolute path 
    totals = {} 

    # topdown=False iterates lowest level (leaf) subdirectories first. 
    # This way I can collect grand totals of files per subdirectory. 
    for path,dirs,files in os.walk('.',topdown=False): 
     files_in_current_directory = len(files) 

     # Start with the files in the current directory and compute a 
     # total for all subdirectories, which will be in the `totals` 
     # dictionary already due to topdown=False. 
     files_including_subdirs = files_in_current_directory 
     for d in dirs: 
      fullpath = os.path.abspath(os.path.join(path,d)) 

      # On my Windows system, Junctions weren't included in os.walk, 
      # but would show up in the subdirectory list. this try skips 
      # them because they won't be in the totals dictionary. 
      try: 
       files_including_subdirs += totals[fullpath] 
      except KeyError as e: 
       print 'KeyError: {} may be symlink/junction'.format(e) 

     totals[os.path.abspath(path)] = files_including_subdirs 
     outwriter.writerow([path,files_in_current_directory,files_including_subdirs]) 
+0

非常感谢你的帮助,这一个完美的工作,对不起,我没有机会检查,直到今天。 – user1843473

3

你应该尝试沿着线的东西:

for root,dirs,files in os.walk(rootDir) : 
    print root, len(files) 

它打印子目录和文件的数量。

相关问题