在开始和停止标志之间读取多个文件块

我正在尝试将文件的各个部分读入numpy数组，这些数组对文件的不同部分具有相似的启动和停止标志。目前我已经找到了一种可行的方法，但在输入文件需要重新打开输入文件之前只有一部分输入文件。在开始和停止标志之间读取多个文件块

我此刻的代码是：

with open("myFile.txt") as f: 
     array = [] 
     parsing = False 
     for line in f: 
      if line.startswith('stop flag'): 
      parsing = False 
     if parsing: 
      #do things to the data 
     if line.startswith('start flag'): 
      parsing = True

我发现的代码从这个question

有了这个代码，我需要重新开放，并通过文件读取。

有没有办法读取所有部分，而不必打开每个部分读取文件？

来源

2015-07-19 user27630

你的文件有多大/你用发电机有多舒服？ – NightShadeQueen

您可以使用itertools.takewhile每次到达开始标志的时间采取直至停止：

from itertools import takewhile 
with open("myFile.txt") as f: 
     array = [] 
     for line in f: 
      if line.startswith('start flag'):    
       data = takewhile(lambda x: not x.startswith("stop flag"),f) 
       # use data and repeat

或者只是使用内部循环：

with open("myFile.txt") as f: 
    array = [] 
    for line in f: 
     if line.startswith('start flag'): 
      # beginning of section use first lin 
      for line in f: 
       # check for end of section breaking if we find the stop lone 
       if line.startswith("stop flag"): 
        break 
       # else process lines from section

一个文件对象返回自己的迭代器，所以p当您到达开始标志时，ointer将继续移动，重复执行f，开始处理一个区段直到您停止。没有理由重新打开该文件，只需在文件的各行上迭代一次即可使用这些部分。如果开始和停止标志线被认为是该部分的一部分，请确保也使用这些线。

来源

2015-07-19 23:39:27

嵌套循环解决方案完美工作。谢谢。 – user27630

你有压痕的问题，你的代码应该是这样的：

with open("myFile.txt") as f: 
    array = [] 
    parsing = False 
    for line in f: 
     if line.startswith('stop flag'): 
     parsing = False 
     if parsing: 
     #do things to the data 
     if line.startswith('start flag'): 
     parsing = True

来源

2015-07-19 23:39:11 Chaker

-1

比方说，这是你的文件阅读：

**starting** blabla blabla **starting** bleble bleble **starting** bumbum bumbum

这是程序的代码：

file = open("testfile.txt", "r") 
data = file.read() 
file.close 
data = data.split("**starting**") 
print(data)

这是输出：

['', '\nblabla\nblabla\n', '\nbleble\nbleble\n', '\nbumbum\nbumbum']

以后你可以del空元素，或在您的data中执行其他操作。 split函数被构建为string对象，并且可以获取更复杂的字符串作为参数。

来源

2015-07-19 23:43:08 Laszlowaty

与你相似的解决办法是：

result = [] 
parse = False 
with open("myFile.txt") as f: 
    for line in f: 
     if line.startswith('stop flag'): 
      parse = False 
     elif line.startswith('start flag'): 
      parse = True 
     elif parse: 
      result.append(line) 
     else: # not needed, but I like to always add else clause 
      continue 
print result

但你也可能使用内循环或itertools.takewhile其他答案建议。特别是使用takewhile对于真正的大文件应该快得多。

来源

2015-07-19 23:47:56

在开始和停止标志之间读取多个文件块

回答

相关问题