2017-02-21 68 views
1

我有以下问题。当HTML从<img>标记开始,我保存$dom->saveHTML()我只收到第一个图像作为响应。但是,当我在<img>标记之前添加任何字符串时,我会为HTML获取额外的<p></p>标记。这是为什么?php domDocument()saveHTML HTML保存时只保存第一张图片<img>

$h = '<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>'; 

$h = 'abc<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>'; 

以上是例子输入

<?php 

$h = '<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>'; 

    echo'start<br />'; 
    echo htmlspecialchars($h); 
    echo'<br />end<br />'; 

    $dom = new domDocument(); 
    $dom->loadHTML($h, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); 
    $dom->preserveWhiteSpace = false; 
    $images = $dom->getElementsByTagName('img'); 
    foreach ($images as $image) { 
     $img_class = $image->getAttribute('class'); 

     if($img_class == '') { 
      $image->setAttribute('class', 'img-responsive img-rounded'); 
      echo'add class <br />'; 
     } 
    } 

    $my_post_content = $dom->saveHTML(); 

    echo'start<br />'; 
    echo htmlspecialchars($my_post_content); 
    echo'<br />end<br />'; 

回答

0

这位朋友你好我使你的脚本的一些测试,似乎第二图像消失,因为LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD比你传递给$dom->loadHTML($h, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

有可能是一个简单的解决方案来做到这一点“黑客”,并使用这样的事情:

$h = 'abc<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';

然后,只需手动剪切必要的东西从字符串,但我给你一个更好的解决方案:

$h = '<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>'; 

echo'start<br />'; 
echo htmlspecialchars($h); 
echo'<br />end<br />'; 

// blank document is used because we want to extract only the 
// html inside <body> from $dom 
$blank = new DOMDocument; 

// initialize the $dom object and nothing is changed in this code 
$dom = new domDocument(); 
$dom->loadHTML($h); 
$dom->preserveWhiteSpace = false; 
$images = $dom->getElementsByTagName('img'); 
foreach ($images as $image) { 
    $img_class = $image->getAttribute('class'); 

    if ($img_class == '') { 
     $image->setAttribute('class', 'img-responsive img-rounded'); 
     echo'add class <br />'; 
    } 
} 

// now get the body that will containg updated HTML 
// and insert all it's children in the blank document 
$body = $dom->getElementsByTagName('body')->item(0); 
foreach ($body->childNodes as $child) { 
    $blank->appendChild($blank->importNode($child, true)); 
} 

$my_post_content = $blank->saveHTML($blank); 

echo'start<br />'; 
echo htmlspecialchars($my_post_content); 
echo'<br />end<br />'; 
exit; 

输出将是:

start 
<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br> 
end 
add class 
add class 
start 
<img src="https://example.com/one.jpg" alt="" class="img-responsive img-rounded"><br><p>bla</p><img src="https://example.com/foo.jpg" alt="" class="img-responsive img-rounded"><br> 
end 

,你看你有你们两个图像。

干杯!