Python的：把一个文件的特定行到一个列表

我钻进了以下问题：

鉴于以下结构的文件：

'>some cookies 
chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple 
'>some icecream 
cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee 
'>some other stuff 
letsseewhatfancythings 
wegotinhere

目的：投入

：含有“>”到列表作为单个字符串

每行代码后的所有项

所以这个功能经过文件的每一行，如果没有的“>”它串接所有后续行的发生，并且如果“>”发生除去的“\ n”，，它会自动追加串接的字符串列表和“清除”字符串“序列”的串接下一序列

问题：采取的输入文件的例子，它只是把东西从“饼干”和'一些冰淇淋“列入清单 - 但不是来自”一些其他的东西“。所以我们得到的结果如下：

[chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple, cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee] but not 

[chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple, cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee, letsseewhatfancythings 
wegotinhere]

这里有什么错误的想法？在我可能没有注意的迭代中存在一些逻辑错误，但我不知道在哪里。

在此先感谢您的任何提示！

来源

2011-04-17 Daniyal

道歉，并感谢Manoj Govindan的编辑！ – Daniyal 2011-04-17 14:39:54

的问题是，你只保存当前部分seq当你打在它'>'一条线。文件结束后，您仍然可以打开该部分，但不保存该部分。

修复程序最简单的方法是这样的：

def parseSequenceIntoDictionary(filename): 
    lis=[] 
    seq='' 
    with open(filename, 'r') as fp: 
     for line in fp: 
      if('>' not in line): 
       seq+=line.rstrip() 
      elif('>' in line): 
       lis.append(seq) 
       seq='' 
     # the file ended 
     lis.append(seq) # store the last section 
     lis.remove('') 
     return lis

顺便说一句，你应该使用if line.startswith("'>"):以防止可能的错误。

来源

2011-04-17 15:28:12

“#store最后一节”是失踪的想法非常感谢帮助 - 以及使用line.startswith（str）的建议： – Daniyal 2011-04-17 15:42:28

好了，你可以简单地分为上'>（如果我得到你正确的）

>>> s=""" 
... '>some cookies 
... chocolatejelly 
... peanutbuttermacadamia 
... doublecoconutapple 
... '>some icecream 
... cherryvanillaamaretto 
... peanuthaselnuttiramisu 
... bananacoffee 
... '>some other stuff 
... letsseewhatfancythings 
... wegotinhere """ 
>>> s.split("'>") 
['\n', 'some cookies \nchocolatejelly \npeanutbuttermacadamia \ndoublecoconutapple \n', 'some icecream \ncherryvanillaamaretto \npeanuthaselnuttiramisu \nbananacoffee \n', 'some other stuff \nletsseewhatfancythings \nwegotinhere '] 
>>>

来源

2011-04-17 14:40:25 kurumi

这个解决方案很吸引人。但是如何在包含'>'的行之后强制分割？ – Daniyal 2011-04-17 15:01:27

如果用一个新行>发现你只追加序列的结果列表。所以最后你有一个填充seq（你缺少的数据），但是你不会把它添加到结果列表中。因此，在你的循环之后，如果有一些数据，就加seq，你应该没问题。

来源

2011-04-17 14:41:02 Achim

啊，我明白了，但是如果有一些数据存在，我该如何添加seq？ – Daniyal 2011-04-17 15:02:46

my_list = [] 
with open('file_in.txt') as f: 
    for line in f: 
     if line.startswith("'>"): 
      my_list.append(line.strip().split("'>")[1]) 

print my_list #['some cookies', 'some icecream', 'some other stuff']

来源

2011-04-17 15:14:53 snippsat

import re 

def parseSequenceIntoDictionary(filename,regx = re.compile('^.*>.*$',re.M)): 
    with open(filename) as f: 
     for el in regx.split(f.read()): 
      if el: 
       yield el.replace('\n','') 

print list(parseSequenceIntoDictionary('aav.txt'))

来源

2011-04-17 17:22:12 eyquem

Python的：把一个文件的特定行到一个列表

回答

相关问题