2014-10-19 72 views
0

我试图解析它具有以下格式的文本文件输出文件作为CSV:解析和在Python

+++++ 
line1 
line2 
<<<<< 
+++++ 
rline1 
rline2 
<<<<< 

其中,+++++指记录的开始和<<<<<指记录的末尾。

现在我要输出的整个文本为CSV的格式如下:

line1, line2 
rline1, rline2 

我想某事像这样:

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<'] 
output_lines =[] 

for line in lines: 
    if (line == "+++++") or not(line == "<<<<<") : 
     if (line == "<<<<<"): 
      output_lines.append(line) 
      output_lines.append(",") 

print (output_lines) 

我不知道如何从这里向前迈进。

回答

0

收集线在嵌套循环,直到最终记录的最标记,写出来的结果列表到CSV文件:

import csv 

with open(inputfilename) as infh, open(outputfilename, 'w', newline='') as outfh: 
    writer = csv.writer(outfh) 
    for line in infh: 
     if not line.startswith('+++++'): 
      continue 

     # found start, collect lines until end-of-record 
     row = [] 
     for line in infh: 
      if line.startswith('<<<<<'): 
       # found end, end this inner loop 
       break 
      row.append(line.rstrip('\n')) 

     if row: 
      # lines for this record are added to the CSV file as a single row 
      writer.writerow(row) 

外环需要从输入文件中的行,但跳过任何看起来不像记录的开始。一旦找到开始,第二个内部循环从文件对象中抽取更多行,并且只要它们不是而不是看起来像记录的结尾,将它们添加到列表对象(无行分隔符) 。

找到记录的结尾时,结束内循环,并且如果在row列表中收集了任何行,则会将其写入CSV文件。

演示:

>>> import csv 
>>> from io import StringIO 
>>> import sys 
>>> demo = StringIO('''\ 
... +++++ 
... line1 
... line2 
... <<<<< 
... +++++ 
... rline1 
... rline2 
... <<<<< 
... ''') 
>>> writer = csv.writer(sys.stdout) 
>>> for line in demo: 
...  if not line.startswith('+++++'): 
...   continue 
...  row = [] 
...  for line in demo: 
...   if line.startswith('<<<<<'): 
...    break 
...   row.append(line.rstrip('\n')) 
...  if row: 
...   writer.writerow(row) 
... 
line1,line2 
13 
rline1,rline2 
15 

书写线后的数字是写入的字节的数量,如通过writer.writerow()报道。

1

也许是这样的?

from itertools import groupby 
import csv 

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<'] 

# remove the +++++s, so that only the <<<<<s indicate line breaks 
cleaned_list = [ x for x in lines if x is not "+++++" ] 

# separate at <<<<<s 
rows = [list(group) for k, group in groupby(cleaned_list, lambda x: x == "<<<<<") if not k] 

f = open('result.csv', 'wt') 
try: 
    writer = csv.writer(f) 
    for row in rows: 
     writer.writerow(row) 
finally: 
    f.close() 

print open('result.csv', 'rt').read() 
+0

好用的groupby,但你可能想添加一些关于这里发生了什么的描述。 – PaulMcG 2014-10-19 13:51:46