href网址匹配，

-2

可能重复：
Grabbing the href attribute of an A element href网址匹配，

我试着在页面源匹配：

<a href="/download/blahbal.html">

我已经看过另一个链接此网站，并使用正则表达式：

'/<a href=["\']?(\/download\/[^"\'\s>]+)["\'\s>]?/i'

它返回页面上的所有href链接，但是它忽略了某些链接上的.html。

任何帮助将不胜感激。

谢谢

来源

2011-09-01 Jamesmiller

也许正则表达式错过这样的HREF，反正我建议你使用一个解析器（DOM文档）和用这个检索所有的“a”标签。 “ – CaNNaDaRk

”在某些链接上缺失“ - 您可以举一个.html丢失的例子吗？ – FrankS

使用XPath'/ html/body // a [@ href = starts-with（。，'/ download'）]' – Gordon

首先使用方法described here检索所有的HREF，那么你可以使用正则表达式或strpos为“过滤掉”那些谁不与/下载/启动。
堆栈溢出的其他许多帖子（see this）讨论了你应该使用解析器而不是正则表达式的原因。一旦你解析了文档并获得了你需要的hrefs，那么你可以用简单的函数将它们过滤掉。

一些代码：

$dom = new DOMDocument; 
//html string contains your html 
$dom->loadHTML($html); 
//at the end of the procedure this will be populated with filtered hrefs 
$hrefs = array(); 
foreach($dom->getElementsByTagName('a') as $node) { 
    //look for href attribute 
    if($node->hasAttribute('href')) { 
     $href = $node->getAttribute('href'); 
     // filter out hrefs which don't start with /download/ 
     if(strpos($href, "/download/") === 0) 
      $hrefs[] = $href; // store href 
    } 
}

来源

2011-09-01 10:07:16 CaNNaDaRk

经过测试，作品。如果有必要，strpos很容易被正则表达式（preg_match）所忽略。 – CaNNaDaRk

谢谢，即使你可以用正则表达式，我仍然很好奇。 – Jamesmiller

这取决于匹配中缺少哪些链接，也许正则表达式只是稍微调整一下。 – CaNNaDaRk

href网址匹配，

回答

相关问题