2016-11-07 95 views
1

我写了一个代码,用于添加两个不同文本文件中的数字。对于非常大的数据2-3 GB,我得到了MemoryError。所以,我正在使用一些函数编写一个新代码,以避免将整个数据加载到内存中。功能不能正确返回列表

这段代码打开一个输入文件“d.txt”的读出更大的数据的某些行之后的数字如下:

SCALAR 
ND 3 
ST 0 
TS 1000 
1.0 
1.0 
1.0 
SCALAR 
ND 3 
ST 0 
TS 2000 
3.3 
3.4 
3.5 
SCALAR 
ND 3 
ST 0 
TS 3000 
1.7 
1.8 
1.9 

并增加了数从较小的文本文件已阅读的“e .TXT”如下:

SCALAR 
ND 3 
ST 0 
TS 0 
10.0 
10.0 
10.0 

的结果写入一个文本文件 'output.txt的' 这样的:

SCALAR 
ND 3 
ST 0 
TS 1000 
11.0 
11.0 
11.0 
SCALAR 
ND 3 
ST 0 
TS 2000 
13.3 
13.4 
13.5 
SCALAR 
ND 3 
ST 0 
TS 3000 
11.7 
11.8 
11.9 

,我编写的代码:

def add_list_same(list1, list2): 
    """ 
    list2 has the same size as list1 
    """ 
    c = [a+b for a, b in zip(list1, list2)] 
    print(c) 
    return c 


def list_numbers_after_ts(n, f): 
    result = [] 
    for line in f: 
     if line.startswith('TS'): 
      for node in range(n): 
       result.append(float(next(f))) 
    return result 


def writing_TS(f1): 
    TS = [] 
    ND = [] 
    for line1 in f1: 
     if line1.startswith('ND'): 
      ND = float(line1.split()[-1]) 
     if line1.startswith('TS'): 
      x = float(line1.split()[-1]) 
      TS.append(x) 
    return TS, ND 


with open('d.txt') as depth_dat_file, \ 
    open('e.txt') as elev_file, \ 
    open('output.txt', 'w') as out: 
    m = writing_TS(depth_dat_file) 
    print('number of TS', m[1]) 
    for j in range(0,int(m[1])-1): 
     i = m[1]*j 
     out.write('SCALAR\nND {0:2f}\nST 0\nTS {0:2f}\n'.format(m[1], m[0][j])) 
     list1 = list_numbers_after_ts(int(m[1]), depth_dat_file) 
     list2 = list_numbers_after_ts(int(m[1]), elev_file) 
     Eh = add_list_same(list1, list2) 
     out.writelines(["%.2f\n" % item for item in Eh]) 

的output.txt的是这样的:

SCALAR 
ND 3.000000 
ST 0 
TS 3.000000 
SCALAR 
ND 3.000000 
ST 0 
TS 3.000000 
SCALAR 
ND 3.000000 
ST 0 
TS 3.000000 

添加列表不工作,除了我单独检查的功能,他们的工作。我没有发现错误。我改变了很多,但它不起作用。任何建议?我非常感谢您提供的任何帮助!

回答

1

您可以使用grouper通过固定行数读取文件。如果组中的行顺序保持不变,则下一个代码应该可以工作。

from itertools import zip_longest 

#Split by group iterator 
#See http://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks 
def grouper(iterable, n, padvalue=None): 
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue) 

add_numbers = [] 

with open("e.txt") as f: 
    # Read data by 7 lines 
    for lines in grouper(f, 7): 
     # Suppress first SCALAR line 
     for line in lines[1:]: 
      # add last number in every line to array (6 elements) 
      add_numbers.append(float(line.split()[-1].strip())) 

#template for every group 
template = 'SCALAR\nND {:.2f}\nST {:.2f}\nTS {:.2f}\n{:.2f}\n{:.2f}\n{:.2f}\n' 

with open("d.txt") as f, open('output.txt', 'w') as out: 
    # As before 
    for lines in grouper(f, 7): 
     data_numbers = [] 
     for line in lines[1:]: 
      data_numbers.append(float(line.split()[-1].strip())) 
     # in result_numbers sum elements of two arrays by pair (6 elements) 
     result_numbers = [x + y for x, y in zip(data_numbers, add_numbers)] 
     # * unpack result_numbers as 6 arguments of function format 
     out.write(template.format(*result_numbers)) 
0

我不得不改变代码中的一些小东西,现在,它的作品,但只是很小的输入文件,因为很多变量被加载到内存中。你能告诉我,我怎样才能以良率工作?

from itertools import zip_longest 

def grouper(iterable, n, padvalue=None): 
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue) 


def writing_ND(f1): 
    for line1 in f1: 
     if line1.startswith('ND'): 
      ND = float(line1.split()[-1]) 
      return ND 


def writing_TS(f): 
    for line2 in f: 
     if line2.startswith('TS'): 
      x = float(line2.split()[-1]) 
      TS.append(x) 
    return TS 
TS = [] 
ND = [] 
x = 0.0 
n = 0 
add_numbers = [] 

with open("e.txt") as f, open("d.txt") as f1,\ 
    open('output.txt', 'w') as out: 
    ND = writing_ND(f) 
    TS = writing_TS(f1) 
    n = int(ND)+4 
    f.seek(0) 
    for lines in grouper(f, int(n)): 
     for item in lines[4:]: 
      add_numbers.append(float(item)) 
    i = 0 
    for l in grouper(f1, n): 
     data_numbers = [] 
     for line in l[4:]: 
      data_numbers.append(float(line.split()[-1].strip())) 
      result_numbers = [x + y for x, y in zip(data_numbers, add_numbers)] 
     del data_numbers 
     out.write('SCALAR\nND %d\nST 0\nTS  %0.2f\n' % (ND, TS[i])) 
     i += 1 
     for item in result_numbers: 
      out.write('%s\n' % item)