2010-05-28 156 views
0

我在文件编码方面有点问题。红宝石文件编码

我收到一个url编码的字符串,比如“sometext%C3%B3 + more +%26 + andmore”,使用它,处理数据并用windows-1252编码保存。

的转换是这些:

irb(main) >> value 
=> "sometext%C3%B3+more+%26+andmore" 
irb(main) >> CGI::unescape(value) 
=> "sometext\303\263 more & andmore" 
irb(main) >> #Some code and saved into a file using open(filename, "w:WINDOWS-1252") 
irb(main) >> # result in the file: 
=> sometextĂ³ more & andmore 

而且结果应该是sometextó more & andmore

回答

4

编码的支持已经被添加到Ruby 1.9,所以下面的代码是从1.9.1:

require 'cgi' 
#=> true 
s = "sometext%C3%B3+more+%26+andmore" 
#=> "sometext%C3%B3+more+%26+andmore" 
t = CGI::unescape s 
#=> "sometext\xC3\xB3 more & andmore" 
t.force_encoding 'utf-8' # telling Ruby that the string is UTF-8 encoded 
#=> "sometextó more & andmore" 
t.encode! 'windows-1252' # changing encoding to windows-1252 
#=> "sometext? more & andmore" 
# here you do whatever you want to do with windows-1252 encoded string 

Here你有很多关于Ruby和编码的信息。

PS。红宝石1.8.7不具有内置的编码的支持,所以你必须使用转换一些外部库,例如iconv

require 'iconv' 
#=> true 
require 'cgi' 
#=> true 
s = "sometext%C3%B3+more+%26+andmore" 
#=> "sometext%C3%B3+more+%26+andmore" 
t = CGI::unescape s 
#=> "sometext\303\263 more & andmore" 
Iconv.conv 'windows-1252', 'utf-8', t 
#=> "sometext\363 more & andmore" 
# \363 is ó in windows-1252 encoding 
+0

我没有说什么,但我需要使用Ruby的解决方案1.8.7 (但谢谢:)) – pablorc 2010-05-28 14:09:48

+0

我已经更新了相应的答案。 – 2010-05-29 12:21:16

+0

我的输入存在一些问题,但是这有效。 谢谢! – pablorc 2010-05-31 08:10:11