2010-05-24 258 views
1

我有这样的数据文件:解析文件红宝石

01 JUL something 
     something 
     something    445 
     something else 
01 JUL whatever 
     everwa3 
     lklkj     445 
     something else 
02 JUL ljkjlkj 
     ljkljlkj 
     lkjkjlk    500 
     lkjkj 
02 JUL ljlkjklj 
     lkjkjlkj 
     lkjkj     500 
     lkjlkj 

最后,我想找出 7月01日445有02多少OCCURENCES JUL 500有

在这种情况下,这将是..

01 JUL 445 = 2 

02 JUL 500 = 2 

我能够在线路读取和获取数据了......我该怎么去计算同样的事情?

回答

1
counts = {} 
date = "" 
file.readlines.each_with_index do |l, i| 
    if i % 4 == 0 # first line 
    date = l.split("\t").first 
    elsif i % 4 == 3 # third line 
    wierd_num = l.split("\t").last 
    counts[date+" "+wierd_num] ||= 0 
    counts[date+" "+wierd_num] += 1 
    end 
end 

puts counts # => {"01 JUL 445" => 2, "02 JUL 500" => 2} 
+0

谢谢。虽然,现在我遇到了UTF-8字符的问题。请参阅http://stackoverflow.com/questions/2897398/broken-utf-8-string-ruby – josh 2010-05-24 13:51:56