2017-02-23 32 views
1

我想捕获冒号存在的特定数据。我曾经尝试这样做:如何在php正则表达式中排除个案

preg_match_all("/^(.+):(.+)/im", $input_lines, $output_array); 

该输入数据

last_name, first_name 
bjorge philip: hello world  
bjorge:world 
kardashian, kim 
some http://hi.com ok 
jim https://hey.com yes 
same http://www.vim.com:2018 why 
it's about 20/08/2018 1:23 pm 
time is 01:20:24 now 
capture my name : my name is micky mouse 
mercury, freddie 
I need to be: 
captured 
    capture me :  
if you can 
where is : freddie 
freddie is not: 
home 

我需要捕获bjorge philip: hello worldbjorge:worldI need to be: capturedcapture me : if you canwhere is : freddiefreddie is not: homecapture my name : my name is micky mouse线和排除包含任何时间任何线或URL

+0

在你的例子'冒号空间'会起作用 – nogad

+0

让你的生活更轻松,并且单独遍历每一行而不是尝试写出多行的正则表达式。 – Sammitch

+0

@nogad该空间是可选的。对不起,我编辑我的例子 – Kal

回答

1
<?php 
$input_lines="last_name, first_name 
bjorge philip: hello world  
bjorge:world 
kardashian, kim 
some http://hi.com ok 
jim https://hey.com yes 
same http://www.vim.com:2018 why 
it's about 20/08/2018 1:23 pm 
time is 01:20:24 now 
capture my name : my name is micky mouse 
mercury, freddie 
I need to be: 
captured 
    capture me :  
if you can 
where is : freddie 
freddie is not: 
home "; 

preg_match_all("/^|\n(?![^:]*$|.*?https?:|.*\d:\d+)(.*?:\s*\r?\n.*|.*?:\s?.+)/",$input_lines,$output_array); 
// \r? can be omitted from regex depending on system 

foreach($output_array[0] as $output){ 
    echo $output,"<br>"; 
} 

Regex pattern bre akdown:

^|\n      #start string from beginning of $input_lines or after any newline 
    (?!     #begin negative lookahead group 
     [^:]*$   #ignore lines with no colon 
     |    #OR 
     .*?https?:  #ignore lines with http: or https: 
     |    #OR 
     .*\d:\d   #ignore lines with digit colon digit 
    )     #end negative lookahead group 
    (     #begin capture group 
     .*?:\s*\r?\n.* #capture 2 lines if 1st line has a colon then 0 or more 
         # spaces with no non-white characters before the newline 
     |    #OR 
     .*?:\s?.+  #capture 1 line when it contains a colon followed by 
         # 0 or 1 space then 1 or more non-white characters 
    )     #end capture group 

这将返回:

bjorge philip: hello world 
bjorge:world 
capture my name : my name is micky mouse 
I need to be: captured 
capture me : if you can 
where is : freddie 
freddie is not: home 

我已经花了大量的时间写这为您解决。如果没有进一步扩展样本集,我希望它获得您的批准。

+0

有没有空格是可选的,所以我不能依靠那个作为触发器。我编辑了我的示例 – Kal

+0

我已根据您的问题更新更新了我的答案。 – mickmackusa

+0

这真棒@mickmackusa。如果行以冒号结尾,你会在那里添加什么。即'你好世界:'? – Kal