限制文本一定数目的字符忽略HTML标签/属性

我有一个文本块这样的：限制文本一定数目的字符忽略HTML标签/属性

<p class="post">Lorem ipsum dolor sit amet, <a href="http://website.com/link" target="_blank" title="hello">consectetur adipiscing elit</a>. Pellentesque vehicula tortor eget tortor fermentum bibendum. Duis mollis nisl et metus vulputate, a aliquam quam pharetra. <a href="http://website.com/link" target="_blank" title="hello">consectetur adipiscing elit</a> quis hendrerit nibh ultrices eget. <span class="highlight">Praesent</span> eu mollis lectus, sed convallis quam.</p>

我想经过100个字符截断文本。只需一个文本字符串，我会使用类似：

var new_string = text_string.substring(0,100);

但我需要计时的字符时，使其截断后100个可见字符的文本，以文本中的链接和其他HTML考虑，不是100个字符的HTML本身，并且保留文本中的HTML标记。

注意：我不能保留任何HTML标记，因此我需要在截断标记之前不截断文本，或截断文本，然后添加正确的结束标记。

可以做到这一点吗？

来源

2016-12-16 John

您可以按文档顺序遍历节点，并且无论何时到达文本节点时，都可以查看它有多少个字符。保持运行总数，当你到达超过最大值的节点时，截断那里，然后清空每个后续的文本节点。 – 2016-12-16 22:13:16

你可以运行正则表达式来查找><之间的所有文本。 – Alon

你想要去掉html吗？或截断文本并离开HTML？这通常是在清除html之后完成的，因为只计算文本并且仍然有一个有效的html，没有一堆空的html标记或格式可能会炸毁布局，这并不容易。 –

地带的所有的HTML从与正则表达式的字符串标签和然后子串

var new_string = text_string.replace(/<[^>]*>/g, "").substring(0,100);

[UPDATE]我读到的保留的HTML代码，唯一的解决办法我认为是这样的：

var regx = new RegExp(/(<[^>]*>)/g); 
var counter = 0; 

//convert the string in array using the HTML tags as delimiter and keeping they as array elements 
strArray = str.split(regx); 

for (var i = 0, len = strArray.length; i < len; i++) { 
    //ignore the array elements that is HTML tags 
    if (!(regx.test(strArray[i]))) { 
     //if the counter is 100, remove this element with text 
     if (counter == 100) { 
      strArray.splice(i, 1); 
      continue; //ignore next commands and continue the for loop 
     } 
     //if the counter != 100, increase the counter with this element length 
     counter = counter + strArray[i].length; 
     //if is over 100, slice the text of this element to match the total of 100 chars and set the counter to 100 
     if (counter > 100) { 
      var diff = counter - 100; 
      strArray[i] = strArray[i].slice(0, -diff); 
      counter = 100; 
     } 
    } 
} 

//new string from the array 
new_string = strArray.join(''); 

//remove empty html tags from the array 
new_string = new_string.replace(/(<(?!\/)[^>]+>)+(<\/[^>]+>)/g, "");

现场示例Codepen

来源

2016-12-16 22:28:57 Davebra

感谢您的回复。问题是我需要在文本中保留任何HTML标记，而不是仅仅删除它们并截断文本。 – John

对不起，我没有红。我能想到的唯一解决方案是使用html标记的正则表达式将数组中的字符串拆分为“splitter”，然后使用for循环，仅使用计数器变量对包含文本的元素进行chars计数，然后断开或当计数器是100时，用文本删除元素。我发表了带有注释的代码。 – Davebra

这正是我所需要的，它看起来很完美！你是男人！非常感谢 - 非常感谢！ – John

一个做

var html = 'YOUR HTML STRING' 
var elt = document.createElement('container'); 
elt.innerHTML = html; 
var text = elt.textContent; 
var result = text.substring(0,100);

来源

2016-12-16 22:21:01 IAmDranged

感谢您的回复。问题是我需要在文本中保留任何HTML标记，而不是仅仅删除它们并截断文本。 – John

如果str为您字符串中使用它来获取所有的文字方式。

var str = '<p class="post">Lorem ipsum dolor sit amet, <a href="http://website.com/link" target="_blank" title="hello">consectetur adipiscing elit</a>. Pellentesque vehicula tortor eget tortor fermentum bibendum. Duis mollis nisl et metus vulputate, a aliquam quam pharetra. <a href="http://website.com/link" target="_blank" title="hello">consectetur adipiscing elit</a> quis hendrerit nibh ultrices eget. <span class="highlight">Praesent</span> eu mollis lectus, sed convallis quam.</p>' 
 
var allTheText = str.replace(/<[^>]*>/g,"") 
 
console.log(allTheText.length)

来源

2016-12-16 22:21:40 Alon

感谢您的回复。问题是我需要在文本中保留任何HTML标记，而不是仅仅删除它们并截断文本。 – John

@john你可以得到allTheText的长度，找到你想删除的最后一个字符，找到它在原始字符串中，并删除它后面的所有字符串。 – Alon

限制文本一定数目的字符忽略HTML标签/属性

回答

相关问题