Python：函数跳过打开文件的注释行，并用当前行传递文件对象

我正在尝试编写一些Python代码来编辑某个特定软件的某些（现有）输入和输出文件。我感兴趣的所有文件可以以第一个字符为＃（注释行数未知）的注释行开始。Python：函数跳过打开文件的注释行，并用当前行传递文件对象

我总是想跳过这些评论行来读取/存储重要的文本。因此，我想创建一个函数，对于以读取模式打开的文件对象，它将跳过注释行，以使下一次从文件对象读取的调用位于第一个非注释行的文件。目前，我试图创建一个类，然后使用skip_comments（）方法（参见下面的代码）：

import os 
class FileOperations: 

    def __init__(self, directory, filename): 
     self.directory = directory 
     self.filename = filename 
     self.filepath = os.path.abspath(os.path.join(directory,filename)) 
     self.fo = open(self.filepath,'r') 

    def skip_comments(self): 
     """ Passes the current position to the location of the first non-comment 
     line of self.fo""" 

     for line in self.fo: 
      if not line.lstrip().startswith('#'): 
       break 
     print line ## Just to check if in correct spot

一个类实例化对象的作品，我可以像读取普通对象的文件操作（）和seek（）：

In [47]: fh = FileOperations('file_directory','file.txt')` 
In [48]: fh.fo.read(10) 
Out[48]: '#This file'` 
In [49]: fh.fo.seek(0)

但是当我尝试使用skip_comments（）方法，然后将目标文件中我有问题阅读：

In [50]: fh.skip_comments() 
20 740 AUX IFACE AUX QFACT AUX CELLGRP 

Out[50]: <open file '... file_dir\file.txt', mode 'r' at 0x0000000008797D20> 
In [51]: fh.fo.read(10) 
--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-51-20f04ae797fe> in <module>() 
----> 1 fh.fo.read(10) 

ValueError: Mixing iteration and read methods would lose data

有人可以帮我解决这个bug或建议的更好的方法这样做？谢谢！

来源

2014-10-09 dhltp

[This question]（http://stackoverflow.com/questions/4762262/is-it-safe-to-mix-readline-and-line-iterators-in-python-file-processing）解释了原因错误。基本上，因为'next（f）'（在使用迭代时调用）在内部使用预读缓冲区来提高性能，所以您不能在f中将'f.read（）'与'for line'混合使用，但是这与使用'read'或'readline'不兼容，因为他们不知道预读缓冲区。 – dano 2014-10-09 18:38:37

你想要做的是把skip_lines()函数变成一个生成器。下面的生成器会根据您传递给它的文件名生成非注释行。

所以：

def skip_comments(filename): 
    with open(filename, 'rb') as f: 
     for line in f: 
      if not line.strip().startswith('#'): 
       yield line 

#then, to use the generator you've just created: 
for line in skip_comments(filename): 
    #do stuff with line 

#if you want all the lines at the same time... 
lines = list(skip_comments(filename)) 
#lines is now a list of all non-comment lines in the file

编辑：更快（更密集的）版本将skip_comments = lambda filename: (line for line in open(filename, 'rb') if not line.startswith('#'))。这使用了一个更快的发生器表达式（在我的机器上节省了大约三分之一的时间）。

来源

2014-10-09 17:57:58

为什么不'如果不是......：屈服......'并放弃其他？ – 2014-10-09 17:59:22

@AaronHall：有道理。我正在考虑使用'break'这个问题更明显。 – 2014-10-09 18:00:27

@ChinmayKanchi：我去了解发电机，结果更加困惑。你能否扩展你的范例，如何真正做我想做的事。换句话说：给定您的生成器skip_comments，如何应用它，然后在给定文件名中的注释之后对文本执行一些其他操作？ – dhltp 2014-10-16 23:44:51

Python：函数跳过打开文件的注释行，并用当前行传递文件对象

回答

相关问题