2012-07-28 48 views
0

我需要通过Net :: HTTP获取一些数据,它工作良好,我收到ASCII- 8位。问题是如何将它编码为utf8并保存所有非拉丁符号?Net :: HTTP.get_response.body中的非拉丁(西里尔)符号的RoR ASCII-8bit到UTF-8

随着@content.encode('utf-8', 'binary', :invalid => :replace, :undef => :replace, :replace => '')我失去所有的西里尔符号

随着@content.encode('utf-8', 'binary')我得到"\xCB" from ASCII-8BIT to UTF-8错误

随着@content.force_encoding("UTF-8)我得到西里尔字母符号代替

我无法找到答案用谷歌搜索。

回答

3

问题就解决了与

begin 
    cleaned = response.body.dup.force_encoding('UTF-8') 
    unless cleaned.valid_encoding? 
     cleaned = response.body.encode('UTF-8', 'Windows-1251') 
    end 
    content = cleaned 
rescue EncodingError 
    content.encode!('UTF-8', invalid: :replace, undef: :replace) 
end 

here is more complete data

相关问题