2017-06-03 73 views
0

我是新来的python,我试图使用这个目前无法运行的代码从文本文件中提取两个头之间的信息。如何提取两个标题之间的信息?

with open('toysystem.txt','r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    i = 0 
    lines = f.readlines() 
    for line in lines: 
    if line == start: 
    keywords = lines[i+1] 
i += 1 

仅供参考,文本文件看起来像这样:

<Keywords> 
GTO 
</Keywords> 

上什么可能是错误的代码的任何想法?或者也许是解决这个问题的另一种方法?

谢谢!

回答

1
  • 行从文件中读取在结尾处包含换行符号,所以我们也许应该strip他们,

  • f对象是iterator,所以我们并不需要在这里使用str.readlines方法。

因此,我们可以写类似

with open('toysystem.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    for line in f: 
     if line.rstrip() == start: 
      break 
    for line in f: 
     if line.rstrip() == end: 
      break 
     keywords.append(line) 

给我们

>>> keywords 
['GTO\n'] 

如果您不需要在关键字的结尾换行符以及 - 带他们太:

with open('toysystem.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    for line in f: 
     if line.rstrip() == start: 
      break 
    for line in f: 
     if line.rstrip() == end: 
      break 
     keywords.append(line.rstrip()) 

>>> keywords 
['GTO'] 

但在这种情况下,将更好地generator创建剥离线,如

with open('toysystem.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    stripped_lines = (line.rstrip() for line in f) 
    for line in stripped_lines: 
     if line == start: 
      break 
    for line in stripped_lines: 
     if line == end: 
      break 
     keywords.append(line) 

这不相同。


最后,如果你需要在脚本中的下一个部分的线,我们可以使用str.readlines和剥离线发生器:

with open('test.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    lines = f.readlines() 
    stripped_lines = (line.rstrip() for line in lines) 
    for line in stripped_lines: 
     if line.rstrip() == start: 
      break 
    for line in stripped_lines: 
     if line.rstrip() == end: 
      break 
     keywords.append(line.rstrip()) 

给我们

>>> lines 
['<Keywords>\n', 'GTO\n', '</Keywords>\n'] 
>>> keywords 
['GTO'] 

进一步阅读

0

使用Python重新模块insted的和使用正则表达式解析它?

import re 
with open('toysystem.txt','r') as f: 
    contents = f.read() 
    # will find all the expressions in the file and return a list of values inside the(). You can extend the expression according to your need. 
    keywords = re.findall(r'\<keywords\>\s*\n*\s*(.*?)\s*\n*\s*\<\/keywords\>') 
    print(keywords) 

从您的文件时,它会打印

['GTO'] 

更多有关正则表达式和python检查TutorialspointFor python3Python2