正则表达式：查找页面上的所有链接w/nofollow

我正在尝试编写一个RegEx，它可以查找具有rel =“nofollow”属性的网页上的所有链接。你要知道，我是一个正则表达式福利局所以请不要在恶劣的我:)正则表达式：查找页面上的所有链接w/nofollow

这是我走到这一步：

$link = "/<a href=\"([^\"]*)\" rel=\"nofollow\">(.*)<\/a>/iU";

显然，这是非常错误的。任何其他属性的链接或样式稍有不同（单引号）都不会匹配。

来源

2012-02-27 Linkjuice57

[不要。使用。正则表达式。至。解析。 HTML。]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454）...小马，他来了。 – rdlowrey 2012-02-27 20:55:51

你真的应该使用DOM parser用于此目的的任何基于正则表达式的解决方案将是容易出错的这种HTML解析。考虑这样的代码：

$doc = new DOMDocument(); 
libxml_use_internal_errors(true); 
$doc->loadHTML($html); // loads your html 
$xpath = new DOMXPath($doc); 
// returns a list of all links with rel=nofollow 
$nlist = $xpath->query("//a[@rel='nofollow']");

来源

2012-02-27 20:59:06 anubhava

你打败我吧！ – cwallenpoole 2012-02-27 21:00:07

谢谢，我已将您的示例添加到http://htmlparsing.com/php.html – 2012-02-27 22:20:04

试试这个：

$link = "/<(a)[^>]*rel\s*=\s*(['\"])nofollow\\2[^>]*>(.*?)<\/\\1>/i";

来源

2012-02-27 20:49:49

正则表达式：查找页面上的所有链接w/nofollow

回答

相关问题