2013-02-19 162 views
0

我使用红宝石,试图解析的形式为文本文件的字典文本文件...解析红宝石

AAB eel bbc 
ABA did eye non pap mom ere bob nun eve pip gig dad nan ana gog aha 
    mum sis ada ava ewe pop tit gag tat bub pup 
    eke ele hah huh pep sos tot wow aba ala 
    bib dud tnt 
ABB all see off too ill add lee ass err xii ann fee vii inn egg odd bee dee goo 
    woo cnn pee fcc tee wee ebb edd gee ott ree vee ell orr rcc att boo cee cii 
    coo kee moo mss soo doo faa hee icc iss itt kii loo mee nee nuu ogg opp pii 
    tll upp voo zee 

我需要能够通过第一列进行搜索,如“ AAB“,然后搜索与该密钥关联的所有值。我试图将文本文件导入数组的散列,但永远不会超过第一个存储的值。我对如何搜索文件没有兴趣,无论是将数据存储到某个数据结构中,还是每次只搜索文本文件,我都需要能够做到。我对如何继续这一点感到茫然,任何帮助将不胜感激。由于

-amc25114

回答

3

这将读取你的字典文件。我将内容存储在一个字符串中,然后 将它变成一个StringIO对象,让我假装它是一个文件。您可以使用 File.readlines直接从文件本身的读取:

require 'pp' 
require 'stringio' 

text = 'AAB eel bbc 
ABA did eye non pap mom ere bob nun eve pip gig dad nan ana gog aha 
    mum sis ada ava ewe pop tit gag tat bub pup 
    eke ele hah huh pep sos tot wow aba ala 
    bib dud tnt 
ABB all see off too ill add lee ass err xii ann fee vii inn egg odd bee dee goo 
    woo cnn pee fcc tee wee ebb edd gee ott ree vee ell orr rcc att boo cee cii 
    coo kee moo mss soo doo faa hee icc iss itt kii loo mee nee nuu ogg opp pii 
    tll upp voo zee 
' 

file = StringIO.new(text) 

dictionary = Hash[ 
    file.readlines.slice_before(/^\S/).map{ |ary| 
    key, *values = ary.map(&:strip).join(' ').split(' ') 
    [key, values] 
    } 
] 

dictionary是一个哈希看起来像:

{ 
    "AAB"=>[ 
    "eel", "bbc" 
    ], 
    "ABA"=>[ 
    "did", "eye", "non", "pap", "mom", "ere", "bob", "nun", "eve", "pip", 
    "gig", "dad", "nan", "ana", "gog", "aha", "mum", "sis", "ada", "ava", 
    "ewe", "pop", "tit", "gag", "tat", "bub", "pup", "eke", "ele", "hah", 
    "huh", "pep", "sos", "tot", "wow", "aba", "ala", "bib", "dud", "tnt" 
    ], 
    "ABB"=>[ 
    "all", "see", "off", "too", "ill", "add", "lee", "ass", "err", "xii", 
    "ann", "fee", "vii", "inn", "egg", "odd", "bee", "dee", "goo", "woo", 
    "cnn", "pee", "fcc", "tee", "wee", "ebb", "edd", "gee", "ott", "ree", 
    "vee", "ell", "orr", "rcc", "att", "boo", "cee", "cii", "coo", "kee", 
    "moo", "mss", "soo", "doo", "faa", "hee", "icc", "iss", "itt", "kii", 
    "loo", "mee", "nee", "nuu", "ogg", "opp", "pii", "tll", "upp", "voo", "zee" 
    ] 
} 

您可以查找使用键:

 
dictionary['AAB'] 
=> ["eel", "bbc"] 

而且在阵列内使用include?进行搜索:

 
dictionary['AAB'].include?('eel') 
=> true 
dictionary['AAB'].include?('foo') 
=> false 
0
class A 

    def initialize 
    @h, key = readlines.inject({}) do |m, s| 
     a = s.split 
     m[key = a.shift] = [] if s =~ /^[^\s]/ 
     m[key] += a 
     m 
    end 
    end 

    def lookup k, v # not sure what you really want to do here 
    p [k, v, (@h[k].index v)] 
    end 

    self 
end.new.lookup 'ABA', 'wow' 
0

我的2美分:

file = File.open("/path_to_file_here") 
recent_key = "" 
results = Hash.new 
while (line = file.gets) 
    key = line[/[A-Z]+/] 
    recent_key = key if key 
    line.scan(/[a-z]+/).each do |val| 
    results[recent_key.to_sym] = [] if !results[recent_key.to_sym] 
    results[recent_key.to_sym] << val 
    end 
end 
puts results 

这会给你此输出中:

 
{ 

:AAB=>["eel", "bbc"], 

:ABA=>["did", "eye", "non", "pap", "mom", "ere", "bob", "nun", "eve", "pip", "gig", "dad", "nan", "ana", "gog", "aha", "mum", "sis", "ada", "ava", "ewe", "pop", "tit", "gag", "tat", "bub", "pup", "eke", "ele", "hah", "huh", "pep", "sos", "tot", "wow", "aba", "ala", "bib", "dud", "tnt"], 

:ABB=>["all", "see", "off", "too", "ill", "add", "lee", "ass", "err", "xii", "ann", "fee", "vii", "inn", "egg", "odd", "bee", "dee", "goo", "woo", "cnn", "pee", "fcc", "tee", "wee", "ebb", "edd", "gee", "ott", "ree", "vee", "ell", "orr", "rcc", "att", "boo", "cee", "cii", "coo", "kee", "moo", "mss", "soo", "doo", "faa", "hee", "icc", "iss", "itt", "kii", "loo", "mee", "nee", "nuu", "ogg", "opp", "pii", "tll", "upp", "voo", "zee"] 

}