2010-03-04 91 views
5

我有一个字符串红宝石解析字符串

input = "maybe (this is | that was) some ((nice | ugly) (day |night) | (strange (weather | time)))" 

如何在Ruby中最好的方法来分析这个字符串?

我的意思是脚本应该能够建立sententes这样的:

也许这是一些丑陋的夜晚

也许这是一些很好的夜晚

也许这是一些奇怪的时间

等等,你明白了...

我应该通过字符char读取字符串char并使用堆栈建立状态机来存储括号值以供以后计算,还是有更好的方法?

也许一个现成的,开箱即用的图书馆用于这种目的?

回答

8

尝试Treetop。描述语法的是类似Ruby的DSL。解析你给出的字符串应该很容易,通过使用真正的解析器,你可以很容易地在以后扩展你的语法。

一个例子语法为要解析串的类型(保存为sentences.treetop):

grammar Sentences 
    rule sentence 
    # A sentence is a combination of one or more expressions. 
    expression* <Sentence> 
    end 

    rule expression 
    # An expression is either a literal or a parenthesised expression. 
    parenthesised/literal 
    end 

    rule parenthesised 
    # A parenthesised expression contains one or more sentences. 
    "(" (multiple/sentence) ")" <Parenthesised> 
    end 

    rule multiple 
    # Multiple sentences are delimited by a pipe. 
    sentence "|" (multiple/sentence) <Multiple> 
    end 

    rule literal 
    # A literal string contains of word characters (a-z) and/or spaces. 
    # Expand the character class to allow other characters too. 
    [a-zA-Z ]+ <Literal> 
    end 
end 

语法上述需要一个伴随文件,定义,使我们能够访问该节点值的类(另存为sentence_nodes.rb)。

class Sentence < Treetop::Runtime::SyntaxNode 
    def combine(a, b) 
    return b if a.empty? 
    a.inject([]) do |values, val_a| 
     values + b.collect { |val_b| val_a + val_b } 
    end 
    end 

    def values 
    elements.inject([]) do |values, element| 
     combine(values, element.values) 
    end 
    end 
end 

class Parenthesised < Treetop::Runtime::SyntaxNode 
    def values 
    elements[1].values 
    end 
end 

class Multiple < Treetop::Runtime::SyntaxNode 
    def values 
    elements[0].values + elements[2].values 
    end 
end 

class Literal < Treetop::Runtime::SyntaxNode 
    def values 
    [text_value] 
    end 
end 

以下示例程序显示解析您给出的例句非常简单。

require "rubygems" 
require "treetop" 
require "sentence_nodes" 

str = 'maybe (this is|that was) some' + 
    ' ((nice|ugly) (day|night)|(strange (weather|time)))' 

Treetop.load "sentences" 
if sentence = SentencesParser.new.parse(str) 
    puts sentence.values 
else 
    puts "Parse error" 
end 

这个程序的输出是:

maybe this is some nice day 
maybe this is some nice night 
maybe this is some ugly day 
maybe this is some ugly night 
maybe this is some strange weather 
maybe this is some strange time 
maybe that was some nice day 
maybe that was some nice night 
maybe that was some ugly day 
maybe that was some ugly night 
maybe that was some strange weather 
maybe that was some strange time 

您也可以访问语法树:

p sentence 

The output is here

你有它:一个可扩展的解析解决方案,应该在50行左右的代码中完成你想做的事情。这有帮助吗?

+0

谢谢,我已经阅读了网上的例子,但我不明白我怎么能读嵌套圆括号...... – astropanic 2010-03-04 14:56:11

+0

谢谢你!你是我的英雄:) – astropanic 2010-03-04 19:55:34

+0

http://www.bestechvideos.com/2008/07/18/rubyconf-2007-treetop-syntactic-analysis-with-ruby,不错的视频 – astropanic 2010-03-05 06:37:23