在ruby中，file.readlines.each不比file.open.each_line快，为什么？

只是分析我的IIS日志（BONUS：碰巧知道IISLOG在ASCII编码，errrr ..）在ruby中，file.readlines.each不比file.open.each_line快，为什么？

这里是我的Ruby代码

1.readlines

Dir.glob("*.log").each do |filename| 
    File.readlines(filename,:encoding => "ASCII").each do |line| 
    #comment line 
    if line[0] == '#' 
     next 
    else 
     line_content = line.downcase 
     #just care about first one 
     matched_keyword = keywords.select { |e| line_content.include? e }[0] 
     total_count += 1 if extensions.any? { |e| line_content.include? e } 
     hit_count[matched_keyword] += 1 unless matched_keyword.nil? 
    end 
    end 
end

2.open

Dir.glob("*.log").each do |filename| 
    File.open(filename,:encoding => "ASCII").each_line do |line| 
    #comment line 
    if line[0] == '#' 
     next 
    else 
     line_content = line.downcase 
     #just care about first one 
     matched_keyword = keywords.select { |e| line_content.include? e }[0] 
     total_count += 1 if extensions.any? { |e| line_content.include? e } 
     hit_count[matched_keyword] += 1 unless matched_keyword.nil? 
    end 
    end 
end

"readlines" read the whole file in mem，为什么“打开”总是反而快一点？我在Win7上测试了几次Ruby1.9.3

来源

2013-03-28 rhapsodyn

readlines和open.each_line都只读取一次。 Ruby将在IO对象上执行缓冲操作，因此每次都会从磁盘读取数据块（例如64KB），以最大限度地降低磁盘读取的成本。在磁盘读取步骤中应该有很少的时间差异。

当您调用readlines时，Ruby会构造一个空数组[]并重复读取一行文件内容并将其推送到数组。最后它将返回包含文件所有行的数组。

当您拨打each_line时，Ruby会读取一行文件内容并将其交给您的逻辑。当你完成这一行的处理时，ruby读取另一行。它重复读取行，直到文件中没有更多内容。

这两种方法的区别在于readlines必须将行追加到数组中。当文件很大时，Ruby可能需要复制底层数组（C级）以扩大其大小一次或多次。

挖掘源代码readlines由io_s_readlines执行，其调用rb_io_readlines。 rb_io_readlines调用rb_io_getline_1来获取行，并且rb_ary_push将结果推送到返回数组中。

each_line由rb_io_each_line这就要求rb_io_getline_1去取线就像readlines和屈服线将逻辑与rb_yield实现。

因此，没有必要在each_line的存储阵列中存储行结果，没有阵列大小调整，复制问题。

来源

2013-03-28 10:41:56

在ruby中，file.readlines.each不比file.open.each_line快，为什么？

回答

相关问题