红宝石引入nokogiri - 如何防止引入nokogiri从打印HTML字符实体

我有一个HTML中我使用引入nokogiri解析，然后生成一个HTML出这像这样红宝石引入nokogiri - 如何防止引入nokogiri从打印HTML字符实体

htext= File.open(input.html).read 
h_doc = Nokogiri::HTML(htmltext) 
/////Modifying h_doc////////// 

File.open(output.html, 'w+') do |file| 
file.write(h_doc) 
end

的问题是如何防止引入nokogiri从印刷最终生成的html文件中的HTML字符实体(< >, &  )。

而不是HTML字符实体(< > &  )我想打印实际字符（<，>等）。

As an example it is printing the html like 
<title>&lt;%= ("/emailclient=sometext") %&gt;</title> 
and I want it to output like this 
<title><%= ("/emailclient=sometext")%></title>

来源

2014-09-02 user1788294

所以......你想Nokogiri输出不正确或无效的XML/HTML？

我有最好的建议，事先用别的东西替换这些序列，用Nokogiri剪掉它，然后将它们替换回来。你的输入是而不是XML/HTML，没有一点期待Nokogiri知道如何正确处理它。因为看：

<div>To write "&amp;", you need to write "&amp;amp;".</div>

这使得：

To write "&", you need to write "&amp;".

如果你有你的方式，你会得到这个HTML：

<div>To write "&", you need to write "&amp;".</div>

这将作为渲染：

To write "&", you need to write "&".

更糟糕的是，在这种情况下，比如在XHT ML：

<div>Use the &lt;script&gt; tag for JavaScript</div>

如果更换实体，你不可显示的文件，由于未关闭<script>标签：

<div>Use the <script> tag for JavaScript</div>

编辑我还是觉得你试图让引入nokogiri做一些它不是为了处理模板HTML而设计的。我宁愿认为您的文档通常不包含这些序列，和后改正：

doc.traverse do |node| 
    if node.text? 
    node.content = node.content.gsub(/^(\s*)(\S.+?)(\s*)$/, 
            "\\1<%= \\2 %>\\3") 
    end 
end 
puts doc.to_html.gsub('&lt;%=', '<%=').gsub('%&gt;', '%>')

来源

2014-09-02 04:35:29 Amadan

我想一定是有办法做到这一点。原来的html的格式为 sometext，我希望它被替换这样<%sometext%>。但我越来越喜欢这个 <%; sometext％>。我认真地感觉必须有某种方式。 – user1788294 2014-09-02 06:34:14

http://stackoverflow.com/questions/4476047/how-to-make-nokogiri-not-to-convert-nbsp-to-space。这与将我想做的事情做相反的谈话联系起来。 – user1788294 2014-09-02 06:35:54

只是为了添加更多信息，我正在改变html变量文本，像这样h_doc.traverse do | x | \t \t if x.text？ \t \t \t \t \t \t \t \t \t \t \t x.content = “<％” + x.content + “％>” \t \t \t端 \t \t端 \t端 – user1788294 2014-09-02 06:40:34

你绝对可以阻止引入nokogiri从改变你的实体。它的内置功能，甚至没有巫术或黑客需要。需要警告的是，我不是一个nokogiri guru，我只是在我直接在文档内的一个节点上执行操作时才有这个工作，但我确信有一点挖掘可以告诉你如何使用独立节点太。

当您创建或加载文档时，您需要包含NOENT选项。而已。你完成了，你现在可以添加实体到你的心中。

重要的是要指出，有大约六种方式来调用带有选项的文档，下面是我个人最喜欢的方法。

require 'nokogiri' 
    noko_doc = File.open('<my/doc/path>') { |f| Nokogiri.<XML_or_HTML>(f, &:noent)} 
    xpath = '<selector_for_element>' 
    noko_doc.at_<css_or_xpath>(xpath).set_attribute('I_can_now_safely_add_preformatted_entities!', '&amp;&amp;&amp;&amp;&amp;') 
    puts noko_doc.at_xpath(xpath).attributes['I_can_now_safely_add_preformatted_entities!'] 
>>> &amp;&amp;&amp;&amp;&amp;

至于这个功能的有用性......我觉得它非常有用。有很多情况下，您正在处理您无法控制的预格式化数据，如果要让nokogiri能够恢复原来的状态，管理传入实体将是一件非常痛苦的事情。

来源

2016-03-03 17:30:35 JackChance

红宝石引入nokogiri - 如何防止引入nokogiri从打印HTML字符实体

回答

相关问题