现在,这里没有正则表达式的专家,但他应该是在你的鞋子里;他会做它像这样:
<?php
// SIMULATED SAMPLE HTML CONENT - WITH ATTRIBUTES:
$contents = '<section id="id-1">And even when darkness covers your path and no one is there to lend a hand;
<h3 class="class-1">Always remember that <em>There is always light at the end of the Tunnel <span class="class-2">if you can but hang on to your Faith!</span></em></h3>
<div>Now; let no one deceive you: <h2 class="class-2">You will be tried in ever ways - sometimes beyond your limits...</h2></div>
<article>But hang on because You are the Voice... You are the Light and you shall rule your Destiny because it is all about<h6 class="class4">YOU - THE REAL YOU!!!</h6></article>
</section>';
// SPLIT THE CONTENT AT THE END OF EACH <h[1-6]> TAGS
$parts = preg_split("%<\/h[1-6]>%si", $contents);
$matches = array();
// LOOP THROUGH $parts AND BUNDLE APPROPRIATE ELEMENTS TO THE $matches ARRAY.
foreach($parts as $part){
if(preg_match("%(.*|.?)(<h)([1-6])%si", $part)){
$matches[] = preg_replace("%(.*|.?)(<)(h[1-6])(.*)%si", "$2$3$4$2/$3>", $part);
}
}
var_dump($matches);
//DUMPS::::
array (size=3)
0 => string '<h3 class="class-1">Always remember that <em>There is always light at the end of the Tunnel <span class="class-2">if you can but hang on to your Faith!</span></em></h3>' (length=168)
1 => string '<h2 class="class-2">You will be tried in ever ways - sometimes beyond your limits...</h2>' (length=89)
2 => string '<h6 class="class4">YOU - THE REAL YOU!!!</h6>' (length=45)
作为一个功能,这是它归结为:
<?php
function pseudoMatchHTags($htmlContentWithHTags){
$parts = preg_split("%<\/h[1-6]>%si", $htmlContentWithHTags);
$matches = array();
foreach($parts as $part){
if(preg_match("%(.*|.?)(<h)([1-6])%si", $part)){
$matches[] = preg_replace("%(.*|.?)(<)(h[1-6])(.*)%si", "$2$3$4$2/$3>", $part);
}
}
return $matches;
}
var_dump(pseudoMatchHTags($contents));
你可以在这里进行测试:https://eval.in/571312 ...也许它可以帮助一个bit ...我希望... ;-)
[试过用DOM解析器吗?](http://stackoverflow.com/a/1732454/511529) – GolezTrol
如果'h's有任何属性会失败。 '。*'也是贪婪的,如果你有一个以上的页面,它会吃掉所有东西。解析器是你最好的方法。看看http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php – chris85
正如它在其他文章中所说,不要使用正则表达式解析HTML,除非你的HTML很简单,并且你不需要搜索嵌套标签。即便如此,糟糕的主意。有一些DOM解析器([DOMDocument](https://php.net/domdocument))用于解析HTML,并且很容易处理。他们有几种可用于JS的相同方法,比如'getElementsByTagName',可用于查找每个''标签。 –