2017-02-26 102 views
1

我有一个字符串与电影标题和发布年份。我希望能够检测标题(年)模式,如果匹配,则将其包含在锚标记中。PHP的正则表达式来查找模式和包装锚定标记

包装它很容易。但是如果我不知道电影的名字是什么,那么写一个正则表达式来匹配这个模式是可以的吗?

实施例:

$str = 'A random string with movie titles in it. 
Movies like The Thing (1984) and other titles like Captain America Civil War (2016). 
The movies could be anywhere in this string. 
And some movies like 28 Days Later (2002) could start with a number.'; 

所以图案将总是Title(开始用大写字母),并且将(Year)结束。

这是我这么远:

if(preg_match('/^\p{Lu}[\w%+\/-]+\([0-9]+\)/', $str)){ 
    error_log('MATCH'); 
} 
else{ 
    error_log('NO MATCH'); 
} 

目前这是行不通的。据我了解,这是它应该做的:

^\p{Lu} //match a word beginning with an uppercase letter

[\w%+\/-] //with any number of characters following it

+\([0-9]+\) //ending with an integer

我在哪里这个问题呢?

+0

'([A-Z] {1} [a-z] + \ s?)+ \(\ d + \)'这就是您要找的。为了便于测试正则表达式模式,我使用了RegExr(http://regexr.com/) –

+0

可能会马上开始一个电影标题?如果一部电影完全像'1984(1984)'这样的数字呢?这是需要照顾的吗?暹罗的解决方案,虽然很聪明,不匹配1984年(1984年)是30岁以上的电影。“我只是想确保你提供的样本涵盖所有可能的事件。 – mickmackusa

+0

另一方面,托托的正则表达式捕获1984年(1984年)是一部30岁以上的电影,节省了164个步骤。对我的扩展样本来说,它似乎更胜一筹 – mickmackusa

回答

2

以下正则表达式应该这样做:

(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\) 

说明

  • (?-i)区分大小写
  • (?<=[a-z]\s)向后看任何小写字母和空格
  • [A-Z\d]匹配大写字母或数字
  • .*?匹配任何字符
  • \(\d+\)匹配任何数字,包括括号

DEMO

PHP

<?php 
$regex = '/(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)/'; 
$str = 'A random string with movie titles in it. 
     Movies like The Thing (1984) and other titles like Captain America Civil War (2016). 
     The movies could be anywhere in this string. 
     And some movies like 28 Days Later (2002) could start with a number.'; 
preg_match_all($regex, $str, $matches); 
print_r($matches); 
?> 
+0

很好的答案,并感谢您的解释。我还有很长的路要走! –

0

此正则表达式做ES任务:

~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~ 

说明:

~    : regex delimiter 
    (?:   : start non capture group 
    [A-Z]  : 1 capital letter, (use \p{Lu} if you want to match title in any language) 
    [a-zA-Z]+ : 1 or more letter, if you want to match title in any language(use \p{L}) 
    \s+   : 1 or more spaces 
    |   : OR 
    \d+   : 1 or more digits 
    \s+   : 1 or more spaces 
)+   : end group, repeated 1 or more times 
    \(\d+\)  : 1 or more digits surrounded by parenthesis, (use \d{4} if the year is always 4 digits) 
~    : regex delimiter 

实现:

$str = 'A random string with movie titles in it. 
Movies like The Thing (1984) and other titles like Captain America Civil War (2016). 
The movies could be anywhere in this string. 
And some movies like 28 Days Later (2002) could start with a number.'; 

if (preg_match_all('~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~', $str, $match)) { 
    print_r($match); 
    error_log('MATCH'); 
} 
else{ 
    error_log('NO MATCH'); 
} 

结果:

Array 
(
    [0] => Array 
     (
      [0] => The Thing (1984) 
      [1] => Captain America Civil War (2016) 
      [2] => 28 Days Later (2002) 
     ) 

) 
MATCH