如何将Net :: HTTP响应转换为Ruby 1.9.1中的某种编码？

我有做以下如何将Net :: HTTP响应转换为Ruby 1.9.1中的某种编码？

检索一个HTML页面西纳特拉应用（http://analyzethis.espace-technologies.com）（通过网/ HTTP）
从response.body
创建一个引入nokogiri文档中提取一些信息，并发送它回到了回应。该反应应该是UTF-8编码

我到了这个问题，而试图读取使用windows-1256编码方式，如www.filfan.com或www.masrawy.com网站。

问题是编码转换的结果不正确，虽然没有发生错误。

净/ HTTP response.body.encoding给ASCII-8BIT不能转换为UTF-8

如果我做引入nokogiri :: HTML（response.body），并使用CSS选择器来获得某些来自页面的内容 - 比如标题标签的内容 - 例如，我得到一个字符串，当我调用string.encoding时，返回WINDOWS-1256。我使用string.encode（“utf-8”）并使用它发送响应，但是响应又不正确。

有关我的方法中出现问题的任何建议或想法？

来源

2009-07-30 humanzz

我发现下面的代码为我工作现在

def document 
    if @document.nil? && response 
    @document = if document_encoding 
        Nokogiri::HTML(response.body.force_encoding(document_encoding).encode('utf-8'),nil, 'utf-8') 
       else 
        Nokogiri::HTML(response.body) 
       end 
    end 
    @document 
end 

def document_encoding 
    return @document_encoding if @document_encoding 
    response.type_params.each_pair do |k,v| 
    @document_encoding = v.upcase if k =~ /charset/i 
    end 
    unless @document_encoding 
    #document.css("meta[http-equiv=Content-Type]").each do |n| 
    # attr = n.get_attribute("content") 
    # @document_encoding = attr.slice(/charset=[a-z1-9\-_]+/i).split("=")[1].upcase if attr 
    #end 
    @document_encoding = response.body =~ /<meta[^>]*HTTP-EQUIV=["']Content-Type["'][^>]*content=["'](.*)["']/i && $1 =~ /charset=(.+)/i && $1.upcase 
    end 
    @document_encoding 
end

来源

2009-08-02 00:43:15 humanzz

它很棒！ – 2016-10-28 13:32:02

由于网:: HTTP不正确处理编码。见http://bugs.ruby-lang.org/issues/2567

您可以分析response['content-type']包含字符集的，而不是分析整个response.body。

然后用force_encoding()设置正确的编码。

response.body.force_encoding("UTF-8")如果站点以UTF-8提供服务。

来源

2012-12-08 17:03:12

虽然这个解决方案确实有效，但这个问题只发生在某些网站上。也许当Content-Type包含'application/json'时，它会使用UTF-8编码...？根据http://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean，application/json暗示UTF-8。 – 2014-05-28 14:40:20

如何将Net :: HTTP响应转换为Ruby 1.9.1中的某种编码？

回答

相关问题