2009-06-18 333 views
3

我一直在这一段时间,似乎无法解决它。这是我想要做的。给定三个单词word1,word2和word3,我想构建一个正则表达式,它将按顺序匹配它们,但它们之间有一组潜在的单词(除了新行)。正则表达式 - 匹配一组词

举例来说,如果我有以下几点:

word1 = what 
word2 = the 
word3 = hell 

我想匹配以下字符串,用一根火柴:

"what the hell" 
"what in the hell" 
"what the effing hell" 
"what in the 9 doors of hell" 

我想我可以做到以下几点(允许每个单词变量之间存在0到5个单词):

regex = "\bword1(\b\w+\b){0,5}word2(\b\w+\b){0,5}word3\b" 

唉,不,它不起作用。重要的是我可以指定单词之间的m到n个单词的距离(其中m总是< n)。

回答

1
$ cat try 
#! /usr/bin/perl 

use warnings; 
use strict; 

my @strings = (
    "what the hell", 
    "what in the hell", 
    "what the effing hell", 
    "what in the 9 doors of hell", 
    "hello", 
    "what the", 
    " what the hell", 
    "what the hell ", 
); 

for (@strings) { 
    print "$_: ", /^what(\s+\w+){0,5}\s+the(\s+\w+){0,5}\s+hell$/ 
        ? "match\n" 
        : "no match\n"; 
} 

$ ./try 
what the hell: match 
what in the hell: match 
what the effing hell: match 
what in the 9 doors of hell: match 
hello: no match 
what the: no match 
what the hell: no match 
what the hell : no match 
+0

这是迄今为止最优雅的作品,它的功能与广告中的一样,但有次要的匹配。你告诉我,我关心那件事吗?我最关心的是整个字符串与前面的word1,中间的word2以及末尾的word3(“中间的某个地方”是单词距离问题)匹配。 – 2009-06-18 01:45:38

2

"\bwhat(\s*\b\w*\b\s*){0,5}the(\s*\b\w*\b\s*){0,5}hell"作品我(红宝石)Clojure中

list = ["what the hell", "what in the hell", "what the effing hell", 
    "what in the 9 doors of hell", "no match here hell", "what match here hell"] 

list.map{|i| /\bwhat(\s*\b\w*\b\s*){0,5}the(\s*\b\w*\b\s*){0,5}hell/.match(i) } 
=> [#<MatchData:0x12c4d1c>, #<MatchData:0x12c4d08>, #<MatchData:0x12c4cf4>, 
    #<MatchData:0x12c4ce0>, nil, nil] 
+0

这是匹配整个短语并返回组的结果(1)。 我也试过(\ s * \ w * \ s *){0,5},结果相同。这比我自己做得更多!有什么建议么?我在Python中这样做,以防万一。 – 2009-06-18 01:31:00

0

工作对我来说:

(def phrases ["what the hell" "what in the hell" "what the effing hell" 
       "what in the 9 doors of hell"]) 

(def regexp #"\bwhat(\s*\b\w*\b\s*){0,5}the(\s*\b\w*\b\s*){0,5}hell") 

(defn valid? [] 
    (every? identity (map #(re-matches regexp %) phrases))) 

(valid?) ; <-- true 

按照本休斯的格局。