伯爵HTML链接，并在一个字符串<em>$ HTML</em>添加列表

我存储网站的内容。

我想计数链接到一个文件中杂项文件格式所有的HTML链接，添加一这些链接的列表的$ HTML结束和删除原来的链接。

一个例子：

<?php 
$html_input = ' 
<p> 
    Lorem <a href="font-1.otf">ipsum</a> dolor sit amet, 
    consectetur <a href="http://www.cnn.com">adipiscing</a> elit. 
    Quisque <a href="font-2.otf">ultricies</a> placerat massa 
    vel dictum. 
</p>' 

// some magic here  

$html_output = ' 
<p> 
    Lorem ipsum dolor sit amet, 
    consectetur <a href="http://www.cnn.com">adipiscing</a> elit. 
    Quisque ultricies placerat massa 
    vel dictum. 
</p> 
<p>.otf-links: 2</p> 
<ul> 
    <li><a href="font-1.otf">ipsum</a></li> 
    <li><a href="font-2.otf">ultricies</a></li> 
</ul>' 
?>

我该怎么办呢？我应该使用正则表达式，还是有另一种方式？

来源

2010-02-19 snorpey

不，你不应该用户正则表达式。请参阅：http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 一个真正的答案即将 – 2010-02-19 11:06:24

require_once("simple_html_dom.php"); 

$doc = new simple_html_dom(); 
$doc->load($input_html); 

$fonts = array(); 
$links = $doc->find("a"); 

foreach ($links as $l) { 
    if (substr($l->href, -4) == ".otf") { 
     $fonts[]  = $l->outertext; 
     $l->outertext = $l->innertext; 
    } 
} 

$output = $doc->save() . "\n<p>.otf-links: " . count($fonts) ."</p>\n" . 
    "<ul>\n\t<li>" . implode("</li>\n\t<li>", $fonts) . "</li>\n</ul>";

Documenation为简单的HTML DOM http://simplehtmldom.sourceforge.net/

来源

2010-02-19 11:17:41

+1。少扔在一起比我的。修正了一个可能导致如果在href的长度小于4 – Yacoby 2010-02-19 11:32:15

感谢您的努力脚本失败的问题。这几乎是我想要的，除了它也删除列表中的ancor标签。交换_ $ l-> outertext = $ 1--> innertext; _和_ $ fonts [] = $ l; _没有帮助，那么我该如何解决这个问题？ – snorpey 2010-02-19 14:00:07

@Yacoby谢谢队友;然而，即使字符串长度为0，'substr'也会很快地继续而没有错误，所以检查是没有必要的。 @snorpey我解决了这个问题。请记住，PHP中的对象是通过引用来分配的，除非您明确地克隆它们。解决的办法是在改变之前将锚对象的实际字符串表示赋给'$ fonts []'。 – 2010-02-19 18:42:20

使用DOM Parser

例子：

$h = str_get_html($html); 

$linkCount = count($h->find('a')); 

foreach ($h->find('a') as $a){ 
    //print every link ending in .odf 
    if (ends_with(strtolower($a->href), '.odf')){ //ends with isn't a function, but it is trivial to write 

     echo '<li><a href="'.$a->href.'">'.$a->innertext.'</a></li>'; 
    } 
}

来源

2010-02-19 11:04:13 Yacoby

+1推荐DOM解析器 – marcgg 2010-02-19 11:15:02

我喜欢简单的html dom！你打败了我，但你忽略了关于从原始输入中删除锚标签的部分。 – 2010-02-19 11:23:14

-1

preg_match('~<a href="[^"]+\.otf">.*?</a>~s', $html_input, $matches); 
$linksCount = count($matches[0]); 
preg_replace('~<a href="[^"]+\.otf">.*?</a>~s', '', $html_input); 
$html_input.='<ul><li>'.implode('</li><li>', $matches[0]).'</li></ul>';

来源

2010-02-19 11:07:45

我们都知道，如果你使用正则表达式解析HTML会发生什么...... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – marcgg 2010-02-19 11:14:46

我甚至贴出了对OP的警告评论。例如， – 2010-02-19 11:23:36

伯爵HTML链接，并在一个字符串<em>$ HTML</em>添加列表

回答

相关问题