2017-03-05 61 views
1

一个相关的问题是Preventing DOMDocument::loadHTML() from converting entities但它没有产生一个解决方案。DOMDocument:loadHTML()正在转换htmlentities

此代码:

$html = "<span>&#x1F183;&#x1F174;&#x1F182;&#x1F183;</span>"; 
$doc = new DOMDocument; 
$doc->resolveExternals = false; 
$doc->substituteEntities = false; 
$doc->loadhtml($html); 
foreach ($doc->getElementsByTagName('span') as $node) 
{ 
    var_dump($node->nodeValue); 
    var_dump(htmlentities($node->nodeValue)); 
    var_dump(htmlentities(iconv('UTF-8', 'ISO-8859-1', $node->nodeValue))); 
} 

产生以下HTML:

string(16) "" 
string(16) "" 
string(0) "" 

但我要的是&#x1F183;&#x1F174;&#x1F182;&#x1F183;

我运行PHP版本29年6月5日和ini_get("default_charset")回报UTF-8

回答

0

阅读更多关于http://php.net/manual/en/function.htmlentities.php我注意到它并没有编码所有的unicode。有人在评论中写道superentities,但该功能似乎对我无效。 UTF8entities功能。

这里有两个函数,我从评论部分和代码修改,虽然不完全是我想要的,但它给我的HTML编码值。

$html = "<span>&#x1F183;&#x1F174;&#x1F182;&#x1F183;</span>"; 
$doc = new DOMDocument; 
$doc->resolveExternals = false; 
$doc->substituteEntities = false; 
$doc->loadhtml($html); 
foreach ($doc->getElementsByTagName('span') as $node) 
{ 
    var_dump(UTF8entities($node->nodeValue)); 
} 


function UTF8entities($content="") {   
    $characterArray = preg_split('/(?<!^)(?!$)/u', $content); // return array of every multi-byte character 
    foreach ($characterArray as $character) { 
     $rv .= unicode_entity_replace($character); 
    } 
    return $rv; 
} 

function unicode_entity_replace($c) { //m. perez 
    $h = ord($c{0});  
    if ($h <= 0x7F) { 
     return $c; 
    } else if ($h < 0xC2) { 
     return $c; 
    } 

    if ($h <= 0xDF) { 
     $h = ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F); 
     $h = "&#" . $h . ";"; 
     return $h; 
    } else if ($h <= 0xEF) { 
     $h = ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6 | (ord($c{2}) & 0x3F); 
     $h = "&#" . $h . ";"; 
     return $h; 
    } else if ($h <= 0xF4) { 
     $h = ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12 | (ord($c{2}) & 0x3F) << 6 | (ord($c{3}) & 0x3F); 
     $h = "&#" . $h . ";"; 
     return $h; 
    } 
} 

返回此:

string(36) "&#127363;&#127348;&#127362;&#127363;"