如何快速从大文件创建数组？

我有例如：如何快速从大文件创建数组？

for line in IN.readlines(): 
     line = line.rstrip('\n') 
     mas = line.split('\t') 
     row = (int(mas[0]), int(mas[1]), mas[2], mas[3], mas[4]) 
     self.inetnums.append(row) 
    IN.close()

如果ffilesize == 120MB，脚本时间= 10秒。我可以减少这个时间吗？

来源

2012-04-05 Bdfy

你正在阅读一个120GB的文件到内存中的配置文件信息？你的机器有多少内存？ – interjay 2012-04-05 10:36:25

12GB /秒的硬盘是什么？ – 2012-04-05 10:41:24

你可能会获得一些速度，如果你使用列表综合

inetnums=[(int(x) for x in line.rstrip('\n').split('\t')) for line in fin]

下面是两个不同的版本

>>> def foo2(): 
    fin.seek(0) 
    inetnums=[] 
    for line in fin: 
     line = line.rstrip('\n') 
     mas = line.split('\t') 
     row = (int(mas[0]), int(mas[1]), mas[2], mas[3]) 
     inetnums.append(row) 


>>> def foo1(): 
    fin.seek(0) 
    inetnums=[[int(x) for x in line.rstrip('\n').split('\t')] for line in fin] 

>>> cProfile.run("foo1()") 
     444 function calls in 0.004 CPU seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
     1 0.003 0.003 0.004 0.004 <pyshell#362>:1(foo1) 
     1 0.000 0.000 0.004 0.004 <string>:1(<module>) 
     1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
     220 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects} 
     1 0.000 0.000 0.000 0.000 {method 'seek' of 'file' objects} 
     220 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 


>>> cProfile.run("foo2()") 
     664 function calls in 0.006 CPU seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
     1 0.005 0.005 0.006 0.006 <pyshell#360>:1(foo2) 
     1 0.000 0.000 0.006 0.006 <string>:1(<module>) 
     220 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 
     1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
     220 0.001 0.000 0.001 0.000 {method 'rstrip' of 'str' objects} 
     1 0.000 0.000 0.000 0.000 {method 'seek' of 'file' objects} 
     220 0.001 0.000 0.001 0.000 {method 'split' of 'str' objects} 


>>>

来源

2012-04-05 10:56:33 Abhijit

除了通过删除'readlines'获得的速度之外，你真的会通过使用list comp来获得一些速度吗？在我看来，它似乎只是编写相同代码的另一种方式。 – jamylak 2012-04-05 12:00:43

@jamylak：考虑一个事实，即您不会在循环中多次调用append。我用cProfile的信息更新了我的答案。 – Abhijit 2012-04-05 14:51:02

删除readlines()

只是做

for line in IN:

使用readlines要创建文件中的所有行的列表，然后访问每一个，你不需要做。没有它，for循环只是使用生成器，每次从文件返回一行。

来源

2012-04-05 10:32:01 jamylak

如何快速从大文件创建数组？

回答

相关问题