我有我想读入R.它具有类似于下面的数据从MS SQL Server生成一个CSV文件:阅读CSV既成对和不成引号
# reproduce file
possibilities <- c('this is good','"this has, a comma"','here is a " quotation','')
newstrings <- expand.grid(possibilities,possibilities,possibilities,stringsAsFactors = F)
xwrite <- apply(newstrings,1,paste,collapse = ",")
xwrite <- c('v1,v2,v3',xwrite)
writeLines(xwrite,con = 'test.csv')
我通常会打开这个与Excel和它神奇地读取和写入一个更清洁的R格式,但这是超过了行限制。如果我无法弄清楚,我将不得不返回并以另一种格式输出它。我尝试了很多我读过的变体。
# a few things I've tried
(rl <- readLines('test.csv'))
read.csv('test.csv',header = T,quote = "",stringsAsFactors = F)
read.csv('test.csv',header = F,quote = "",stringsAsFactors = F,skip = 1)
read.csv('test.csv',header = T,stringsAsFactors = F)
read.csv('test.csv',header = F,stringsAsFactors = F,skip = 1)
read.table('test.csv',header = F)
read.table('test.csv',header = F,quote = "\"")
read.table('test.csv',header = T,sep = ",")
scan('test.csv',what = 'character')
scan('test.csv',what = 'character',sep = ",")
scan('test.csv',what = 'character',sep = ",",quote = "")
scan('test.csv',what = 'character',sep = ",",quote = "\"")
unlist(strsplit(rl,split = ','))
这似乎对我有数据的工作,但我不放心重用它,因为它不第六行这说明可能在另一个文件中可能发生的数据。
# works if only comma OR unpaired quotation but not both
rl[grep('^[^\"]*\"[^\"]*$',rl)] <- sub('^([^\"]*)(\")([^\"]*)$','\\1\\3',rl[grep('^[^\"]*\"[^\"]*$',rl)])
writeLines(rl,'testfixed.csv')
read.csv('testfixed.csv')
我发现了一个similar problem,但我的引号的问题是数据独来独往,没有一个统一的格式问题。
是否有可能从此获得正确的data.frame?