2016-03-01 42 views
0

我“M在每天预定耙的任务,将下载一个CSV自动发送每天的Dropbox,分析它,并保存到数据库中工作。我无法控制数据输入到生成CSV报告的程序的方式,因此我无法避免在某些数据中使用双引号。但是,我想知道是否有一种方法可以在rake任务中删除或用单引号替换它们,或者以某种方式通知解析器,使其不会抛出此错误。红宝石:CSV解析器绊倒双引号中我的数据

Rake任务代码:

require 'net/http' 
require 'csv' 
require 'open-uri' 

namespace :fp_import do 
    desc "download abc_relations from dropbox, save as csv, create or update record in db" 
    task :fp => :environment do 
     data = URI.parse("<<file's dropbox link>>").read 

     File.open(Rails.root.join('lib/assets', 'fp_relation.csv'), 'w') do |file| 
     file.write(data) 
     end 

     file= Rails.root.join('lib/assets', 'fp_relation.csv') 

     CSV.foreach(file) do |row| 
      div, fg_style, fg_color, factory, part_style, part_color, comp_code, vendor, design_no, comp_type = row 
      fg_sku = fg_style + "-" + fg_color 
      part_sku = part_style + "-" + part_color 

      relation = FgPart.where('part_sku LIKE ? AND fg_sku LIKE?', "%#{part_sku}%", "%#{fg_sku}%").exists? 
      if relation == false 

       FgPart.create(fg_style: fg_style, fg_color: fg_color, fg_sku: fg_sku, factory: factory, part_style: part_style, part_color: part_color, part_sku: part_sku, comp_code: comp_code, comp_type: comp_type, design_no: design_no) 
      end 
     end 
    end 
end 

有这个CSV约35000行。以下是一个示例。您可以在示例的第4行中看到双引号。

的样本数据:

"01","502210","018","ZH","5931","001","M","","UPHOLSTERED GLIDER A","RM" 
"01","502310","053","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG" 
"01","502310","065","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG" 
"01","502312","424","ZH","25332","NO","O","","UPHOLSTERED GLIDER"AUS"","BAG" 
"01","","277","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG" 
"01","503310","076","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG" 
"01","506210","018","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG" 
"01","506210","467","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG" 
"01","507610","932","AZ","25332","NO","O","","GLIDER","BAG" 
"01","507610","932","AZ","5936","001","M","","GLIDER","RM" 
+0

@Flip我不知道,如果你的修正是正确的。 @ Tatiane:csv数据中的'**'部分还是他们用来标记关键代码? 。 – knut

+0

如果所有的数据都与此类似提取物可能会删除所有“您使用CSV之前 – knut

+0

@knut:我明白了,你可能会right..will撤消部分感谢它指向了 – Flip

回答

2

源CSV格式不正确,报价应该在之前转义。

我会CSV解析它之前编辑文件和逗号之间移除报价,并替换那些简单的双引号,你可以创建情况下一个新的文件,你不想编辑原始。

def fix_csv(file) 
    out = File.open("fixed_"+file, 'w') 
    File.readlines(file).each do |line| 
    line = line[1...-2] #remove beggining and end quotes 
    line.gsub!(/","/,",") #remove all quotes between commas 
    line.gsub!(/"/,"'") #replace double quotes to single 
    out << line +"\n" #add the line plus endline to output 
    end 

    out.close 
    return "fixed_"+file 
end 

如果要修改同一个CSV文件,你可以这样来做:

require 'tempfile' 
require 'fileutils' 

def modify_csv(file) 
    temp_file = Tempfile.new('temp') 
    begin 
    File.readlines(file).each do |line| 
     line = line[1...-2] 
     line.gsub!(/","/,",") 
     line.gsub!(/"/,"'") 
     temp_file << line +"\n" 
    end 
    temp_file.close 
    FileUtils.mv(temp_file.path, file) 
    ensure 
    temp_file.close 
    temp_file.unlink 
    end 
end 

这是情况进行了说明here要看看,这将解决或消毒你原来的CSV文件

+0

为什么你想修复它,然后再解析它?修复后,它已经被解析并准备导入。 –

+0

@pascalbetz,以防万一你不想修改我看到的原始csv – Agush

+0

。 除非您需要将清理过的文件用于其他进程,否则您可以保持原样并在清理完成后将其导入AR。所以不需要读取,清理,写入,读取和导入。 –

2

的CSV是无效的,引号应该逃脱。如果/您可以逐行读取文件中的行,由,拆分和删除前导不需要其他特别处理后"

File.foreach(path) do |line| 
    columns = line.split(',').map do |column| 
    column[1...-1] 
    end 
    do_something_with_data(columns) 
end 

更新版本

file = Kernel.open(File.join(__dir__, 'input.almost_csv')) 
file.each do |line| 
    values = line.split(',') 
    values = values.map do |value| 
    value[1...-1] # Remove leading and trailing double-quote 
    end 

    div, fg_style, fg_color, factory, part_style, part_color, comp_code, vendor, design_no, comp_type = values 
    fg_sku = fg_style + "-" + fg_color 
    part_sku = part_style + "-" + part_color 

    if !FgPart.where('part_sku LIKE ? AND fg_sku LIKE?', "%#{part_sku}%", "%#{fg_sku}%").exists? 
    FgPart.create(fg_style: fg_style, fg_color: fg_color, fg_sku: fg_sku, factory: factory, part_style: part_style, part_color: part_color, part_sku: part_sku, comp_code: comp_code, comp_type: comp_type, design_no: design_no) 
    end 

end 

需要注意的是:

  • 你不需要@本地作用域变量就足够了。
  • 如果你要删除的字符串中的报价,以及,你可以操纵map
  • 这只能内的值,如果你没有列分隔符,中的值
+0

谢谢帕斯卡尔!我不知道这是为了继续我的代码,对不起。你能否在我发布的代码中加入更多解释? –