与其使用RegEx
在html中查找合适的标签,使用DOMDocument
& DOMXPath
如下所示相当容易。
最后一行只是将最终编辑后的html回显到textarea中,但您可以轻松将它保存到文件中。
/* XPath expression to find all anchors that do not contain "#" */
$query='//a[ not (contains(@href, "#")) ]';
/* Some url */
$url='http://stackoverflow.com/questions/39737604/keeping-anchor-tags-and-removing-other-hyperlinks-php-regex';
/* get the data */
$html=file_get_contents($url);
/* construct DOMDocument & DOMXPath objects */
$dom=new DOMDocument;
$dom->loadHTML($html);
$xp=new DOMXPath($dom);
/* Run the query */
$col=$xp->query($query);
/* Process all found nodes */
if(!empty($col)){
/*
As you are removing nodes from the DOM you should
iterate backwards through the collection.
*/
for ($i = $col->length; --$i >= 0;) {
$a = $col->item($i);
$a->parentNode->removeChild($a);
}
/* do something with processed html */
echo "<textarea cols=150 rows=100>",$dom->saveHTML(),"</textarea>";
}
使用'DOMDocument'&'DOMXPath'比正则表达式更容易 – RamRaider
试图稍微打开该解决方案 –