2017-08-07 307 views
1

我想在PHP中创建一个正则表达式,该正则表达式在包含“this”或“that”的文本中至少搜索两次(这样至少两次 “这个” 或至少两次 “是”)正则表达式匹配“this”或“that”至少两次出现在句子中

我们被困在:

([^.?!]*(\bthis|that\b){2,}[^.?!]*[.|!|?]+) 
+0

试试['〜〜[^。?!] * \ b(th(?:is | at))\ b [^。?!] * \ b \ 1 \ b [^。?!] * [ [!]〜i''](https://regex101.com/r/3LzI0V/1) –

+0

它是'this'或'that'两次,或者this'两次,或者'that'两次吗?除非要允许该字符,否则不要在字符类中使用'|'。 – chris85

+0

放置,在你的问题中定义“句子”。这句话是所有你正在寻找的模式,或者它是另一种定义? –

回答

3

使用这种模式(\b(?:this|that)\b).*?\1Demo

(    # Capturing Group (1) 
    \b   # <word boundary> 
    (?:   # Non Capturing Group 
    this  # "this" 
    |   # OR 
    that  # "that" 
)    # End of Non Capturing Group 
    \b   # <word boundary> 
)    # End of Capturing Group (1) 
.    # Any character except line break 
*?    # (zero or more)(lazy) 
\1    # Back reference to group (1) 
-1

使用此

.*(this|that).*(this|that).* 

http://regexr.com/3ggq5

UPDATE

这是另一种方式,它在你的正则表达式:

.*(this\s?|that\s?){2,}.*[\.\n]* 

http://regexr.com/3ggq8

+0

@ chris85完成了。 –

+0

使用反向引用可以使这个更清洁,'(th(?:is | at))。* \ 1'。虽然我不清楚OP的实际目标.. – chris85

+0

但它只匹配以“this”或“that”开始和结束的句子,问题是匹配所有至少有两次“that”的句子或“这个”。 –

0

这主要是Wiktor的公司有偏差模式隔开句子d省略全字符串匹配中的前导空白字符。

模式:/\b[^.?!]*\b(th(?:is|at))\b[^.?!]*(\b\1\b)[^.?!]*\b[.!?]/i

这里是一个将展示如何在其他的答案将不能正确资格为“单词边界”或“不区分大小写”的原因不适合的配对样本文本:(Demo - 适用于捕获组\b\1\b在演示显示其子被出线句子匹配

This is nothing. 
That is what that will be. 
The Indian policeman hit the thief with his lathis before pushing him into the thistles. 
This Indian policeman hit the thief with this lathis before pushing him into the thistles. This is that and that. 
The Indian policeman hit the thief with this lathis before pushing him into the thistles. 

要看到这个模式的正式破裂,请参阅演示链接。

在普通的术语:

/     #start of pattern 
\b     #match start of a sentence on a "word character" 
[^.?!]*   #match zero or more characters not a dot, question mark, or exclamation 
\b(th(?:is|at))\b #match whole word "this" or "that" (not thistle) 
[^.?!]*   #match zero or more characters not a dot, question mark, or exclamation 
\b\1\b    #match the earlier captured whole word "this" or "that" 
[^.?!]*   #match zero or more characters not a dot, question mark, or exclamation 
\b     #match second last character of sentence as "word character" 
[.!?]    #match the end of a sentence: dot, question mark, exclamation 
/    #end of pattern 
i     #make pattern case-insensitive 

模式从上面的示例文本匹配三五句话:

That this is what that will be. 
This Indian policeman hit the thief with this lathis before pushing him into the thistles. 
This is that and that. 

*注意,以前我是用\s*\K在开始我的模式来省略空白字符。为了提高效率,我选择改变我的模式以使用额外的字边界元字符。如果这不适用于您的项目文本,最好还是回到我的original pattern

+0

@TuurSwimberghe自从我提交了我的答案以来,我看到你已经上网。你发现它有什么困难吗?我希望我的答案能最好地为您的项目执行。如果您有任何问题,请留下评论。尽管阿尔法布拉沃的回答收到3 upvotes它没有充分考虑。看到这个[demo](https://regex101.com/r/xXg67T/3),它显示了第一行和最后一行的错误匹配。我的模式不会使这些错误匹配。 – mickmackusa