2013-04-27 65 views
3

我有一个字符串,精巧的URL和其他文本。我想将所有的URL都存入$matches数组中。但是,下面的代码将无法获得全部的URL中$matches阵列:如何从文本字符串获取网址?

$matches = array(); 
$text = "soundfly.us schoollife.edu hello.net some random news.yahoo.com text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988 and others will en.wikipedia.org/wiki/Country_music URL"; 
preg_match_all('$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i', $text, $matches); 
print_r($matches); 

上面的代码将得到:

http://tinyurl.com/9uxdwc 
http://google.com 
http://tinyurl.com/787988 

,但忽略了以下4个网址:

schoollife.edu 
hello.net 
news.yahoo.com 
en.wikipedia.org/wiki/Country_music 

能否请你告诉我用一个例子,我怎么能修改上面的代码来获取所有的URL

+1

你的正则表达式强制指定一个http/https/ftp/file协议。使其可选。 – sevenseacat 2013-04-27 08:11:50

+1

@sevenseacat我也有类似的问题。你可以用修改后的正则表达式来演示一个例子吗? – 2013-04-27 08:45:00

+0

查看我的更新回答 – 2013-04-27 08:57:51

回答

1

这是你需要什么?

$matches = array(); 
$text = "soundfly.us schoollife.edu hello.net some random news.yahoo.com text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988 and others will en.wikipedia.org/wiki/Country_music URL"; 
preg_match_all('$\b((https?|ftp|file)://)?[-A-Z0-9+&@#/%?=~_|!:,.;]*\.[-A-Z0-9+&@#/%=~_|]+$i', $text, $matches); 
print_r($matches); 

我所做的协议部分optionnal,增加劈裂域和TLD和使用点的“+”来获取点后满弦(TLD +额外信息)

结果是:

[0] => soundfly.us 
[1] => schoollife.edu 
[2] => hello.net 
[3] => news.yahoo.com 
[4] => http://tinyurl.com/9uxdwc 
[5] => http://google.com 
[6] => http://tinyurl.com/787988 
[7] => en.wikipedia.org/wiki/Country_music 

也可以使用IP地址,因为强制存在点。用字符串“192.168.0.1”和“192.168.0.1/test/index.php”测试