2017-08-07 83 views
0
irb(main):161:0> "Ready for your my next session?".scan(/[A-Za-z]+|\d+|. /) 
=> ["Ready", "for", "your", "my", "next", "session"] 
=> ["Ready", "for", "your", "my", "next", "session", "?"] #==> EXPECTED 
irb(main):162:0> "yo mr. menon how are you? call at 9 a.m. \"okay\"".scan(/[A-Za-z]+|\d+|. /) 
=> ["yo", "mr", ". ", "menon", "how", "are", "you", "? ", "call", "at", "9", "a", "m", ". ", "okay"] 
=> ["yo", "mr", ". ", "menon", "how", "are", "you", "? ", "call", "at", "9", "a",".", "m", ".", "``", "okay", "''"] #==> EXPECTED 

我试图用这个scan(/[A-Za-z]+|\d+|. /)来标记字符串,甚至标点符号,即使在字符串中的转义报价,\"红宝石串扫描返回不同的字符串

但它是不同的结果在不同的字符串结构上表现不同?如何纠正?

+0

“预期:' “\'\'”, “还行”, “ ''”'” - 你在开玩笑吗? “Regexp#scan”无法将双打字机的报价转换为您所期望的。 – mudasobwa

+0

_Sidenote:_匹配标点符号,正则表达式引擎有一个专用匹配器:['\ p {Punct}'](https://ruby-doc.org/core-2.4.1/Regexp.html#class-Regexp-标签字符+属性)。 – mudasobwa

+0

@mudasobwa如果我知道,我不会开玩笑;)如果不改变,那么如何改正输出到适当的令牌? – arjun

回答

1
r =/
    (?:   # begin a non-capture group 
     \"?  # optionally (?) match a double-quote 
     \p{alpha}+ # match one or more letters 
     \"?  # optionally (?) match a double-quote 
    )   # end non-capture group 
    |   # or 
    \d+   # match one or more digits 
    |   # or 
    [.,?!:;]  # match a punctuation mark 
    /x   # free-spacing regex definition mode 

"yo mr. menon how are you? call at 9 a.m. \"okay\"".scan(r) 
    #=> ["yo", "mr", ".", "menon", "how", "are", "you", "?", "call", "at", "9", 
    # "a", ".", "m", ".", "\"okay\""] 
puts "\"okay\"" 
    # "okay" 

正则表达式通常写

/(?:\"?\p{alpha}+\"?)|\d+|[.,?!:;]/