2012-06-12 99 views
1

我需要检查针对某些文本的模式(我必须检查我的模式是否在很多文本中)。Java正则表达式匹配模式

这是我的例子

String pattern = "^[a-zA-Z ]*toto win(\\W)*[a-zA-Z ]*$";  
if("toto win because of".matches(pattern)) 
System.out.println("we have a winner"); 
else 
System.out.println("we DON'T have a winner"); 

对于我的测试,该模式必须匹配,但使用正则表达式我不匹配。 必须匹配:

" toto win bla bla" 

"toto win because of" 
"toto win. bla bla" 


"here. toto win. bla bla" 
"here? toto win. bla bla" 

"here %dfddfd . toto win. bla bla" 

必须不匹配:

" -toto win bla bla" 
" pretoto win bla bla" 

我尝试使用我的正则表达式来做到这一点,但它不工作。

你能指点我做错了什么吗?

+0

引号是否会出现在输入字符串中? – Cylian

+0

它可以是任何东西。这是一个普通的文本 –

+0

请[不要添加签名和标语到您的帖子](http://stackoverflow.com/faq#signatures)。你也经常拼错“很多”。 “a”和“lot”之间有一个空格。 – meagar

回答

1

这会工作

(?im)^[?.\s%a-z]*?\btoto win\b.+$ 

说明

"(?im)" +   // Match the remainder of the regex with the options: case insensitive (i);^and $ match at line breaks (m) 
"^" +    // Assert position at the beginning of a line (at beginning of the string or after a line break character) 
"[?.\\s%a-z]" + // Match a single character present in the list below 
        // One of the characters “?.” 
        // A whitespace character (spaces, tabs, and line breaks) 
        // The character “%” 
        // A character in the range between “a” and “z” 
    "*?" +   // Between zero and unlimited times, as few times as possible, expanding as needed (lazy) 
"\\b" +   // Assert position at a word boundary 
"toto\\ win" +  // Match the characters “toto win” literally 
"\\b" +   // Assert position at a word boundary 
"." +    // Match any single character that is not a line break character 
    "+" +    // Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
"$"    // Assert position at the end of a line (at the end of the string or before a line break character) 

更新1

(?im)^[?~`'[email protected]#$%^&*+.\s%a-z]*? toto win\b.*$ 

UPDATE 2

(?im)^[^-]*?\btoto win\b.*$ 

UPDATE 3

(?im)^.*?(?<!-)toto win\b.*$ 

说明

"(?im)" +  // Match the remainder of the regex with the options: case insensitive (i);^and $ match at line breaks (m) 
"^" +   // Assert position at the beginning of a line (at beginning of the string or after a line break character) 
"." +   // Match any single character that is not a line break character 
    "*?" +   // Between zero and unlimited times, as few times as possible, expanding as needed (lazy) 
"(?<!" +  // Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) 
    "-" +   // Match the character “-” literally 
")" + 
"toto\\ win" + // Match the characters “toto win” literally 
"\\b" +   // Assert position at a word boundary 
"." +   // Match any single character that is not a line break character 
    "*" +   // Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
"$"    // Assert position at the end of a line (at the end of the string or before a line break character) 

正则表达式需要ESCA用于代码内使用

+0

此字符串不匹配:“here!toto win dfddfd” –

+0

其实可以有任何字符。想象一下网站上的文字。我们可以有任何东西。除了“blatoto win”或“-toto win”之外,我还没有一些文字/字符(除了“ - ”)。 –

+0

太好了。它做我想要的。非常感谢。 –

0

你缺少win和下一个单词之间的空格在您的模式

试试这个:\\stoto\\swin\\s\\w

http://gskinner.com/RegExr/在这里你可以尽你的正则表达式

+0

你的意思是我必须有String pattern =“(\\ s)* toto win(\\ s)*(\\ W)*”; \t? –

+0

@CC。看到我的编辑 – dantuch

+0

@CC,对不起,现在它应该可以正常工作。 – dantuch

0

下面的正则表达式

^[a-zA-Z. ]*toto win[a-zA-Z. ]*$ 

威尔匹配

toto win bla bla 
toto win because of 
toto win. bla bla 

而且不匹配

-toto win bla bla" 
+0

这似乎很棒,但像“toto win。bla bla”这样的字符串不起作用。有任何想法吗 ? –

+0

更新了我的答案。在你的问题中,你提到了“特殊”字符。我补充了一点。通过将其添加到角色类别中来考虑您认为特别的东西。你看到了吗?根据需要添加。 – buckley

+0

我明白了。我刚刚更新了我的问题。仍然不完全工作。我不知道如何在我的模式之前没有性格。 –

1

只要改变你的代码String pattern = "\\s*toto win[\\w\\s]*";

\ W意味着没有文字字符,\ w表示单词字符(A-ZA-Z_0-9)。

[\\w\\s]*将匹配“toto win”后的任意数量的单词和空格。

UPDATE

,以反映新的要求,这表达式将工作:

"((.*\\s)+|^)toto win[\\w\\s\\p{Punct}]*" 

((.*\\s)+|^)比赛无论是什么,然后至少一个空号或行的开始。

[\\w\\s\\p{Punct}]*匹配单词,数字,空格和标点符号的任意组合。

0

如果您包含实际要求,而不是要匹配的东西列表,那么它会更容易。我有一个强烈的怀疑“toto winabc”不应该匹配,但我不确定,因为你没有包括这样的例子或解释的要求。无论如何,这适用于您当前的所有示例:

static String[] matchThese = new String[] { 
     " toto win bla bla", 
     "toto win because of", 
     "toto win. bla bla", 
     "here. toto win. bla bla", 
     "here? toto win. bla bla", 
     "here %dfddfd . toto win. bla bla" 
}; 

static String[] dontMatchThese = new String[] { 
     " -toto win bla bla", 
     " pretoto win bla bla" 
}; 


public static void main(String[] args) { 
    // either beginning of a line or whitespace followed by "toto win" 
    Pattern p = Pattern.compile("(^|\\s)toto win"); 

    System.out.println("Should match:"); 
    for (String s : matchThese) { 
     System.out.println(p.matcher(s).find()); 
    } 

    System.out.println("Shouldn't match:"); 
    for (String s : dontMatchThese) { 
     System.out.println(p.matcher(s).find()); 
    } 
} 
+0

我举例说明了应该匹配哪种文本。文本可以是任何东西,所以我不能使用你的方法。不管怎么说,还是要谢谢你。 –