2013-04-22 70 views
1

我必须使用哪种文件编码才能在R脚本中正确保存此矢量(Matching complex URLs within text blocks (R))?特殊字符和中国标志似乎使事情变得复杂。R脚本文件编码(R Studio)

x <- c("http://foo.com/blah_blah", 
     "http://foo.com/blah_blah/", 
     "(Something like http://foo.com/blah_blah)", 
     "http://foo.com/blah_blah_(wikipedia)", 
     "http://foo.com/more_(than)_one_(parens)", 
     "(Something like http://foo.com/blah_blah_(wikipedia))", 
     "http://foo.com/blah_(wikipedia)#cite-1", 
     "http://foo.com/blah_(wikipedia)_blah#cite-1", 
     "http://foo.com/unicode_(✪)_in_parens", 
     "http://foo.com/(something)?after=parens", 
     "http://foo.com/blah_blah.", 
     "http://foo.com/blah_blah/.", 
     "<http://foo.com/blah_blah>", 
     "<http://foo.com/blah_blah/>", 
     "http://foo.com/blah_blah,", 
     "http://www.extinguishedscholar.com/wpglob/?p=364.", 
     "http://✪df.ws/1234", 
     "rdar://1234", 
     "rdar:/1234", 
     "x-yojimbo-item://6303E4C1-6A6E-45A6-AB9D-3A908F59AE0E", 
     "message://%[email protected]%3e", 
     "http://➡.ws/䨹", 
     "www.c.ws/䨹", 
     "<tag>http://example.com</tag>", 
     "Just a www.example.com link.", 
     "http://example.com/something?with,commas,in,url, but not at end", 
     "What about <mailto:[email protected]?subject=TEST> (including brokets).", 
     "mailto:[email protected]", 
     "bit.ly/foo", 
     "“is.gd/foo/”", 
     "WWW.EXAMPLE.COM", 
     "http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55))/Web_ENG/View_DetailPhoto.aspx?PicId=752", 
     "http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55))", 
     "http://lcweb2.loc.gov/cgi-bin/query/h?pp/horyd:@field([email protected](thc+5a46634))") 

我很感激任何帮助。

+0

您目前使用哪种编码? getOption(“encoding”)给你什么?我试着用'file.R',encoding =“native.enc”)和'source('file.R',encoding =“unknown”)''作为'file.R'的脚本,被阅读。 – user1981275 2013-04-22 14:21:57

回答

0

运行你的榜样,

source('file.R', encoding="unknown") 

工作正常,并保存为R对象和重装的作品,以及:

save(x, file='kk.Rd') 
load('kk.Rd') 

你可以得到所有不同的编码与iconvlist()和测试所有这些,例如:

vals <- lapply(iconvlist(), function(x) 
         tryCatch(source('file.R', encoding=x),     
           error=function(e)return(NULL))) 

file.R b eing

iconvlist()[which(!sapply(vals, function(x)is.null(x)))] 

为您提供加载时没有错误发生的所有编码。

这有帮助吗?

+0

我手动尝试了各种编码,每次打开R脚本时都会错误地显示所提及的标志(即由其他标志代替)。这意味着,从来没有错误,但内容仍然是错误的。 – majom 2013-04-23 13:24:50

+0

当我在Rstudio中打开文件时,它看起来像所有的符号都显示正确。它失败了哪些符号?你在哪个平台上工作? – user1981275 2013-04-23 13:34:59