红宝石：从文本

哈希分配/解析展望创建一个文本输出，看起来像这样的哈希表（空格之间的话是选项卡）：红宝石：从文本

GCOLLECTOR  123456  77889  uno 
BLOCK  unique111 error  fullunique111  ...  ...  ... 
DAY  ... ... ... 
LABEL  detail  unique111  Issue  Broken - The truck broke 
LABEL  detail  unique111  Folder 3c1 
LABEL  detail  unique111  Datum  bar_1666.9 
GCOLLECTOR  234567  77889  uno 
BLOCK  unique222 error  fullunique111  ...  ...  ... 
DAY  ... ... ... 
DAY  ... ... ... 
LABEL  detail  unique222  Issue  Broken - The truck broke 
LABEL  detail  unique222  Datum  bar_9921.2 
LABEL  detail  unique222  Folder 6a3 
GCOLLECTOR  345678  77889  uno 
BLOCK  unique333 error  fullunique111  ...  ...  ...  
LABEL  detail  unique333  Datum  bar_7766.2 
LABEL  detail  unique333  Folder 49k 
LABEL  detail  unique333  Issue  Broken - The truck broke

我想创建一个哈希表受让人每个以下到哈希的：
gcollectors = Hash.new
gcollectors = { "UniqueID" => uniqueXXX, "Datum" => bar_XXXX.X, "FullUniqueID" => fulluniqueXXX, "IssueGroup" => Broken }

的uniqueXXX字段总是与用于该块和相关联的标签。

我遇到了几个问题：
1-我该如何将这些字段分配给哈希值？
2-如何拆分连字符之前的文本（在LABEL ... Issue中）并将其分配给IssueGroup？
3-当LABEL行的顺序不同时，如何可靠地执行此操作？
..同样的问题，当有多个DAY行或没有DAY行。

来源

2010-10-17 user453366

这是我怎么会去一下吧：

records  = [] # init an array to hold everything 
gcollectors = {} # init the hash holding info for one record 

# loop over the file 
File.readlines('text.txt').each do |l| 

    # split the line into columns 
    columns = l.chomp.split("\t") 

    # if the first column is... 
    case columns[0] 
    when 'GCOLLECTOR' 
    # we don't care about the columns, but instead use this record to tell us to 
    # store the hash and reinitialize it. 
    if (gcollectors.any?) 
     records << gcollectors 
     gcollectors = {} 
    end 
    when 'BLOCK' 
    gcollectors['UniqueID']  = columns[1] 
    gcollectors['FullUniqueID'] = columns[3] 
    when 'LABEL' 
    # a LABEL record could have two different values we care about so figure out 
    # which it is. 
    case columns[3] 
    when 'Datum' 
     gcollectors['Datum'] = columns[4] 
    when 'Issue' 
     gcollectors['IssueGroup'] = columns[4].split('-').first.strip 
    end 
    end 

    # get the next record 
    next 
end 

require 'ap' 
ap records 
# >> [ 
# >>  [0] { 
# >>    "UniqueID" => "unique111", 
# >>   "FullUniqueID" => "fullunique111", 
# >>   "IssueGroup" => "Broken", 
# >>    "Datum" => "bar_1666.9" 
# >>  }, 
# >>  [1] { 
# >>    "UniqueID" => "unique222", 
# >>   "FullUniqueID" => "fullunique111", 
# >>   "IssueGroup" => "Broken", 
# >>    "Datum" => "bar_9921.2" 
# >>  } 
# >> ]

来源

2010-10-17 07:05:22

谢谢，完美！ – user453366 2010-10-17 23:32:10

多年来我多次需要这种能力。不是所有的输入数据都是对称的或不幸的是标准/常量格式，所以我们必须找到方法来确定构成记录的块的开始或结束。 – 2010-10-17 23:48:06

确切地说，如果每条记录的顺序相同，我可以算出来，但我更喜欢你的解决方案。你有快速的方法来记录每个独特的值（即IssueGroup）来显示它，然后计数吗？再次，真的很感谢帮助。 – user453366 2010-10-18 00:14:45

gcollectors = text.scan(/^GCOLLECTOR.+\n(?:(?:BLOCK|DAY|LABEL).+\n?)+/).map { |collector| 
    /^BLOCK\t(?<uniqueid>\S+)\t\S+\t(?<fulluniqueid>\S+).+/ =~ collector 
    /^LABEL\t\S+\t\S+\tDatum\t(?<datum>.+)/ =~ collector 
    /^LABEL\t\S+\t\S+\tIssue\t(?<issue>\S+)/ =~ collector 
    Hash[ 
     "UniqueID",uniqueid, 
     "Datum",datum, 
     "FullUniqueID",fulluniqueid, 
     "IssueGroup",issue 
    ] 
} 

gcollectors.each{|i|p i}

{"UniqueID"=>"unique111", "Datum"=>"bar_1666.9", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"} 
{"UniqueID"=>"unique222", "Datum"=>"bar_9921.2", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"} 
{"UniqueID"=>"unique333", "Datum"=>"bar_7766.2", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"}

来源

2010-10-17 10:58:24 Nakilon

谢谢，但我更喜欢Greg的答案。 – user453366 2010-10-17 23:32:32

红宝石：从文本

回答

相关问题