如何阅读的不同目录中的txt文件的内容和重命名等文件，根据

我刚开始使用Python 3，冲进了以下问题：如何阅读的不同目录中的txt文件的内容和重命名等文件，根据

我从网上下载了我的论文的不同期刊的PDF文件的一个很好的协议，但他们都是以他们的DOI命名，而不是以“作者（年） - 标题”的格式。将文档保存在不同的目录，根据期刊的名称和数量，例如：

/Journal 1/ 
    /Vol. 1/ 
     file1.pdf 
     file1.txt 
     file2.pdf 
     file2.txt 
     filen.pdf 
     filen.txt 
    /Vol. 2/ 
     file1.pdf 
     file1.txt 
/Journal 2/ 
    ...

因为我不知道如何阅读与Python中的PDF内容，我写了一个很短的bash脚本，将PDF转换为简单的TXT文件。 pdf和txt文件具有不同的文件扩展名。

我想重新命名所有的PDF文件，幸运的是每个文件的连续文本中都有一个字符串，我可以使用。该变量的字符串位于两个静态字符串之间：

"Cite this article as: " AUTHOR/YEAR/TITLE ", Journal name".

我如何使Python进入每个目录，阅读TXT/PDF内容，提取两个固定字符串之间的变量字符串，然后重命名适当的PDF文件？

如果有人知道如何用Python 3做到这一点，我会非常感激。

来源

2015-07-20 Telefonmann

有些宽泛真的。涉及很多步骤。你究竟在哪一点卡住了？ – usr2564301

如果您在acrobat中打开PDF文件并在文件/属性下查找，这些元数据字符串中是否包含这些文件？ –

不，它们不在元字符串中。我被困在循环目录+所有文件，然后重命名文件。要找到我使用的字符串： '（blablablabla（*）blablablabla”，S） '进口re' 'S = blablablablaAUTHORblablabla'' '结果= re.search'' – Telefonmann

终于得到它的工作：

#__author__ = 'Telefonmann' 
# -*- coding: utf-8 -*- 

import os, re, ntpath, shutil 

for root, dirs, files in os.walk(os.getcwd()): 
    for file in files: # loops through directories and files 
     if file.endswith(('.txt')): # only processes txt files 
      full_path = ntpath.splitdrive(ntpath.join(root, file))[1] 
      # builds correct path under Win 7 (and probably other NT-systems 

      with open(full_path, 'r', encoding='utf-8') as f: 
       content = f.read().replace('\n', '') # remove newline 

       r = re.compile('To\s*cite\s*this\s*article:\s*(.*?),\s*Journal\s*of\s*Quantitative\s*Linguistics\s*,') 
       m = r.search(content) 
       # finds substring inbetween "To cite this article: " and "Journal of Quantitative Linguistics," 
       # also finds typos like "Journal ofQuantitative ..." 

       if m: 
        full_title = m.group(1) 

      print("full_title: {0}".format(full_title)) 
      full_title = (full_title.replace('<','') # removes/replaces forbidden characters in Windows file names 
       .replace('>','') 
       .replace(':',' -') 
       .replace('"','') 
       .replace('/','') 
       .replace('\\','') 
       .replace('|','') 
       .replace('?','') 
       .replace('*','')) 

      pdf_name = full_path.replace('txt','pdf') 
      # since txt and pdf files only differ in their format extension I simply replace .txt with .pdf 
      # to get the right name 

      print('File: '+ file) 
      print('Full Path: ' + full_path) 
      print('Full Title: ' + full_title) 
      print('PDF Name: ' + pdf_name) 
      print('....................................') 
      # for trouble shooting 

      dirname = ntpath.dirname(pdf_name) 
      new_path = ntpath.join(dirname, "{0}.pdf".format(full_title)) 

      if ntpath.exists(full_path): 
       print("all paths found") 
       shutil.copy(pdf_name, new_path) 
       # makes a copy of the pdf file with the new name in the respective directory

来源

2015-07-24 23:12:24 Telefonmann

如何阅读的不同目录中的txt文件的内容和重命名等文件，根据

回答

相关问题