Cheerio：从HTML中提取文本有隔板

比方说，我有以下几点：Cheerio：从HTML中提取文本有隔板

$ = cheerio.load('<html><body><ul><li>One</li><li>Two</li></body></html>'); 

var t = $('html').find('*').contents().filter(function() { 
    return this.type === 'text'; 
}).text();

我得到：

OneTwo

相反的：

One Two

这是同样的结果，我得到如果我做$('html').text()。所以基本上我需要的是注入一个分离器像（空间）或\n

注意：这不是一个jQuery前端的问题更像是Cheerio和HTML解析后端的NodeJS相关的问题。

来源

2015-07-21 Crisboot

这似乎这样的伎俩：

var t = $('html *').contents().map(function() { 
    return (this.type === 'text') ? $(this).text() : ''; 
}).get().join(' '); 

console.log(t);

结果：

One Two

只是提高了自己的解决方案一点点：

var t = $('html *').contents().map(function() { 
    return (this.type === 'text') ? $(this).text()+' ' : ''; 
}).get().join('');

来源

2015-07-21 15:43:00 Crisboot

可以使用TextVersionJS包生成html字符串的纯文本版本。您也可以在浏览器和node.js中使用它。

var createTextVersion = require("textversionjs"); 

var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>"; 

var textVersion = createTextVersion(yourHtml);

从npm下载它并且需要它与Browserify例如。

来源

2016-07-27 14:32:51 Balint

项目被放弃 – Toolkit

Cheerio：从HTML中提取文本有隔板

回答

相关问题