我需要截断字符串指定长度忽略HTML标记。 我找到了合适的功能here。PHP的HTML截断和UTF-8
所以,我提出的光更改,添加缓冲器输入ob_start();
问题是与UTF-8。如果截断字符串的最后一个符号来自间隔[±,č,è,ė,į,š,ø,ü,ü,ž],则我在字符串的末尾得到替换字符U + FFFD 。
这是我的代码。您可以复制,粘贴,并通过自己尝试:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>String truncate</title>
</head>
<?php
$html = '<b>Koks nors tekstas</b>. <p>Lietuviškas žodis.</p>';
$html = html_truncate(27, $html);
echo $html;
/* Truncate HTML, close opened tags
*
* @param int, maxlength of the string
* @param string, html
* @return $html
*/
function html_truncate($maxLength, $html){
$printedLength = 0;
$position = 0;
$tags = array();
ob_start();
while ($printedLength < $maxLength && preg_match('{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}', $html, $match, PREG_OFFSET_CAPTURE, $position)){
list($tag, $tagPosition) = $match[0];
// Print text leading up to the tag.
$str = substr($html, $position, $tagPosition - $position);
if ($printedLength + strlen($str) > $maxLength){
print(substr($str, 0, $maxLength - $printedLength));
$printedLength = $maxLength;
break;
}
print($str);
$printedLength += strlen($str);
if ($tag[0] == '&'){
// Handle the entity.
print($tag);
$printedLength++;
}
else{
// Handle the tag.
$tagName = $match[1][0];
if ($tag[1] == '/'){
// This is a closing tag.
$openingTag = array_pop($tags);
assert($openingTag == $tagName); // check that tags are properly nested.
print($tag);
}
else if ($tag[strlen($tag) - 2] == '/'){
// Self-closing tag.
print($tag);
}
else{
// Opening tag.
print($tag);
$tags[] = $tagName;
}
}
// Continue after the tag.
$position = $tagPosition + strlen($tag);
}
// Print any remaining text.
if ($printedLength < $maxLength && $position < strlen($html))
print(substr($html, $position, $maxLength - $printedLength));
// Close any open tags.
while (!empty($tags))
printf('</%s>', array_pop($tags));
$bufferOuput = ob_get_contents();
ob_end_clean();
$html = $bufferOuput;
return $html;
}
?>
<body>
</body>
</html>
此函数的结果是这样的:
Koks的NORS tekstas。
任何想法为什么这个函数搞乱了UTF-8?
可能重复你的变量](http://stackoverflow.com/questions/6288875/utf-8-compatible-truncate-function) – user