2017-05-29 99 views
1

我有一个像输入一样的大文件,每个4行对应于以@开头的行。第二行(在@之后)是一系列字符,对于某些ID,我们没有这一行。如果是这种情况,我想删除所有属于同一个ID的4行。
我也试过下面的代码在Python中,并给出了错误。编辑文本文件时出错

输入:

@M00872:361:000000000-D2GK2:1:1101:16003:1351 1:N:0:1 
ATCCGGCTCGGAGGA 
+ 
1AA?ADDDADDAGGG 
@M00872:361:000000000-D2GK2:1:1101:15326:1352 1:N:0:1 
GCGCAGCGGAAGCGTGCTGGG 
+ 
CCCCBCDCCCCCGGEGGGGGG 
@M00872:361:000000000-D2GK2:1:1101:16217:1352 1:N:0:1 

+ 

输出:

@M00872:361:000000000-D2GK2:1:1101:16003:1351 1:N:0:1 
ATCCGGCTCGGAGGA 
+ 
1AA?ADDDADDAGGG 
@M00872:361:000000000-D2GK2:1:1101:15326:1352 1:N:0:1 
GCGCAGCGGAAGCGTGCTGGG 
+ 
CCCCBCDCCCCCGGEGGGGGG 


import fileinput 

with fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak") as f: 
    for l in f: 
     if l.strip().startswith("@"): 
      c = 2 
      next_line = f.readline().strip() 
      if not next_line: 
       while c:   
        c -= 1 
        try: 
         next(f) 
        except StopIteration: 
         break 
      else: 
       print(l.strip()) 
       print(next_line.strip()) 
       while c: 
        c -= 1 
        try: 
         print(next(f).strip()) 
        except StopIteration: 
         break 

,但没有工作,给了这个错误:

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
AttributeError: FileInput instance has no attribute '__exit__' 

你知道如何解决这个问题?

+1

你正在使用哪个python版本?我认为这是旧版本不支持fileinput与。因此,使用'f = fileinput.input(files =“4415_pool_TCP_Ctrl.fastq”,inplace = True,backup =“file.bak”) –

+0

python的版本是:2.7 – ARM

回答

2

看起来好像fileinput.FileInput类不执行__exit__()如果您想在with fileinput.input()..语句中使用它,则需要该方法。

1

我认为问题是Python版本(2.7),它不支持的FileInput到with

使用

f = fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak") 

相反

with fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak") as f 
1

虽然有说法是在2.5加入,我不不认为fileinput被移植到使用它(contextlib?)。

你的代码将在python3中工作,但不在2.7中。要解决此问题,要么使用PY 3或端口的代码来遍历线,如:

with open(filename, "r") as f: 
     lines = f.readlines() 

    for line in lines: 
     #do whatever you need to do for each line. 
0

至于你的问题的解决方案(2.7),我会做这样的事情:

# Read all the lines in a buffer 
with open('input.fastq', 'r') as source: 
    source_buff = iter(source.readlines()) 

with open('output.fastq', 'w') as out_file: 
    for line in source_buff: 
    if line.strip().startswith('@'): 
     prev_line = line 
     line = next(source_buff) 

     if line.strip(): 
     # if the 2nd line is not empty write the whole block in the output file 
     out_file.write(prev_line) 
     out_file.write(line) 
     out_file.write(next(source_buff)) 
     out_file.write(next(source_buff)) 
     else: 
     pass 

我知道.fastq文件有时可能会非常大,所以我不建议读取缓冲区中的整个文件,而是将这些代码放在一个循环中,每次读取4行(或块的行数)。