2010-10-17 77 views
0

哈希分配/解析展望创建一个文本输出,看起来像这样的哈希表(空格之间的话是选项卡):红宝石:从文本

GCOLLECTOR  123456  77889  uno 
BLOCK  unique111 error  fullunique111  ...  ...  ... 
DAY  ... ... ... 
LABEL  detail  unique111  Issue  Broken - The truck broke 
LABEL  detail  unique111  Folder 3c1 
LABEL  detail  unique111  Datum  bar_1666.9 
GCOLLECTOR  234567  77889  uno 
BLOCK  unique222 error  fullunique111  ...  ...  ... 
DAY  ... ... ... 
DAY  ... ... ... 
LABEL  detail  unique222  Issue  Broken - The truck broke 
LABEL  detail  unique222  Datum  bar_9921.2 
LABEL  detail  unique222  Folder 6a3 
GCOLLECTOR  345678  77889  uno 
BLOCK  unique333 error  fullunique111  ...  ...  ...  
LABEL  detail  unique333  Datum  bar_7766.2 
LABEL  detail  unique333  Folder 49k 
LABEL  detail  unique333  Issue  Broken - The truck broke 

我想创建一个哈希表受让人每个以下到哈希的:
gcollectors = Hash.new
gcollectors = { "UniqueID" => uniqueXXX,
"Datum" => bar_XXXX.X,
"FullUniqueID" => fulluniqueXXX,
"IssueGroup" => Broken
}

的uniqueXXX字段总是与用于该块和相关联的标签。

我遇到了几个问题:
1-我该如何将这些字段分配给哈希值?
2-如何拆分连字符之前的文本(在LABEL ... Issue中)并将其分配给IssueGroup?
3-当LABEL行的顺序不同时,如何可靠地执行此操作?
..同样的问题,当有多个DAY行或没有DAY行。

回答

0

这是我怎么会去一下吧:

records  = [] # init an array to hold everything 
gcollectors = {} # init the hash holding info for one record 

# loop over the file 
File.readlines('text.txt').each do |l| 

    # split the line into columns 
    columns = l.chomp.split("\t") 

    # if the first column is... 
    case columns[0] 
    when 'GCOLLECTOR' 
    # we don't care about the columns, but instead use this record to tell us to 
    # store the hash and reinitialize it. 
    if (gcollectors.any?) 
     records << gcollectors 
     gcollectors = {} 
    end 
    when 'BLOCK' 
    gcollectors['UniqueID']  = columns[1] 
    gcollectors['FullUniqueID'] = columns[3] 
    when 'LABEL' 
    # a LABEL record could have two different values we care about so figure out 
    # which it is. 
    case columns[3] 
    when 'Datum' 
     gcollectors['Datum'] = columns[4] 
    when 'Issue' 
     gcollectors['IssueGroup'] = columns[4].split('-').first.strip 
    end 
    end 

    # get the next record 
    next 
end 

require 'ap' 
ap records 
# >> [ 
# >>  [0] { 
# >>    "UniqueID" => "unique111", 
# >>   "FullUniqueID" => "fullunique111", 
# >>   "IssueGroup" => "Broken", 
# >>    "Datum" => "bar_1666.9" 
# >>  }, 
# >>  [1] { 
# >>    "UniqueID" => "unique222", 
# >>   "FullUniqueID" => "fullunique111", 
# >>   "IssueGroup" => "Broken", 
# >>    "Datum" => "bar_9921.2" 
# >>  } 
# >> ] 
+0

谢谢,完美! – user453366 2010-10-17 23:32:10

+0

多年来我多次需要这种能力。不是所有的输入数据都是对称的或不幸的是标准/常量格式,所以我们必须找到方法来确定构成记录的块的开始或结束。 – 2010-10-17 23:48:06

+0

确切地说,如果每条记录的顺序相同,我可以算出来,但我更喜欢你的解决方案。你有快速的方法来记录每个独特的值(即IssueGroup)来显示它,然后计数吗?再次,真的很感谢帮助。 – user453366 2010-10-18 00:14:45

0
gcollectors = text.scan(/^GCOLLECTOR.+\n(?:(?:BLOCK|DAY|LABEL).+\n?)+/).map { |collector| 
    /^BLOCK\t(?<uniqueid>\S+)\t\S+\t(?<fulluniqueid>\S+).+/ =~ collector 
    /^LABEL\t\S+\t\S+\tDatum\t(?<datum>.+)/ =~ collector 
    /^LABEL\t\S+\t\S+\tIssue\t(?<issue>\S+)/ =~ collector 
    Hash[ 
     "UniqueID",uniqueid, 
     "Datum",datum, 
     "FullUniqueID",fulluniqueid, 
     "IssueGroup",issue 
    ] 
} 

gcollectors.each{|i|p i} 
{"UniqueID"=>"unique111", "Datum"=>"bar_1666.9", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"} 
{"UniqueID"=>"unique222", "Datum"=>"bar_9921.2", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"} 
{"UniqueID"=>"unique333", "Datum"=>"bar_7766.2", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"}
+0

谢谢,但我更喜欢Greg的答案。 – user453366 2010-10-17 23:32:32