高效批量更新导轨数据库

我试图构建一个rake实用程序，它会经常更新我的数据库。高效批量更新导轨数据库

这是我到目前为止的代码：

namespace :utils do 

    # utils:update_ip 
    # Downloads the file frim <url> to the temp folder then unzips it in <file_path> 
    # Then updates the database. 

    desc "Update ip-to-country database" 
    task :update_ip => :environment do 

    require 'open-uri' 
    require 'zip/zipfilesystem' 
    require 'csv' 

    file_name = "ip-to-country.csv" 
    file_path = "#{RAILS_ROOT}/db/" + file_name 
    url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip' 


    #check last time we updated the database. 
    mod_time = '' 
    mod_time = File.new(file_path).mtime.httpdate if File.exists? file_path 

    begin 
     puts 'Downloading update...' 
     #send conditional GET to server 
     zipped_file = open(url, {'If-Modified-Since' => mod_time}) 
    rescue OpenURI::HTTPError => the_error 
     if the_error.io.status[0] == '304' 
     puts 'Nothing to update.' 
     else 
     puts 'HTTPError: ' + the_error.message 
     end 
    else # file was downloaded without error. 

     Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified'] 
     delay = Time.now - zipped_file.last_modified 
     Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay/60/60/24 } days)" 

     puts 'Unzipping...' 
     File.delete(file_path) if File.exists? file_path 
     Zip::ZipFile.open(zipped_file.path) do |zipfile| 
     zipfile.extract(file_name, file_path) 
     end 

     Iptocs.delete_all 

     puts "Importing new database..." 


     # TODO: way, way too heavy find a better solution. 


     CSV.open(file_path, 'r') do |row| 
     ip = Iptocs.new( :ip_from  => row.shift, 
         :ip_to   => row.shift, 
         :country_code2 => row.shift, 
         :country_code3 => row.shift, 
         :country_name => row.shift) 
     ip.save 
     end #CSV 
     puts "Complete." 

    end #begin-resuce 
    end #task 
end #namespace

我遇到的问题是，这需要几分钟的时间进入10万加项。我想找到一个更有效的方式来更新我的数据库。理想情况下，这将保持独立于数据库类型，但如果不是我的生产服务器将在MySQL上运行。

谢谢你的任何见解。

来源

2010-02-17 codr

您是否尝试过使用AR Extensions进行批量导入？将数千行的行插入到数据库时，您会获得令人印象深刻的性能改进。访问他们的website了解更多详情。

参考这些例子更多信息

2010-02-18 04:48:51

这正是我在找的，谢谢。 – codr 2010-02-19 01:54:05

该gem支持从CSV导入。这消除了“ActiveRecord”实例化和验证成本。有关更多详细信息，请参阅此文章。 http://www.rubyinside.com/advent2006/17-extendingarhtml – 2010-02-19 02:24:52

帮助我也 - 谢谢！ – ambertch 2010-05-17 19:57:58

您可以生成你需要的所有插入一个文本文件，然后执行：

mysql -u user -p db_name < mytextfile.txt

不知道这将是任何速度较快，但值得一试...

来源

2010-02-17 22:41:34 Zepplock

Rails本身使用SQL插入语句。 - 看你的轨道日志。所以这种方法不会提高速度。 – 2010-02-17 22:44:26

当然，Rails会插入INSERT，它会如何将记录添加到数据库中？但在他原来的文章作者正在使用“保存”方法，其中有更多的开销，而不仅仅是一个简单的插入。我敢肯定它涉及到每个插入提交，做模型验证等 – Zepplock 2010-02-18 00:52:51

使用数据库级实用程序为了高速卢克！

不幸的是，它们是数据库特定的。但他们快速对于MySQL，看到http://dev.mysql.com/doc/refman/5.1/en/load-data.html

来源

2010-02-17 22:42:18

拉里说，使用特定的DB-导入实用程序，如果该文件进来，你想要的格式。但是，如果您需要在插入之前操作数据，则可以为多行生成一个包含数据的单个INSERT查询，这比对每行使用单独查询的速度要快（如ActiveRecord所做的那样）。例如：

INSERT INTO iptocs (ip_from, ip_to, country_code) VALUES 
    ('xxx', 'xxx', 'xxx'), 
    ('yyy', 'yyy', 'yyy'), 
    ...;

来源

2010-02-17 23:45:41

我目前正在ActiveRecord的进口，这听起来非常有前途的尝试：

https://github.com/zdennis/activerecord-import

来源

2012-05-02 16:27:03 reto

高效批量更新导轨数据库

回答

相关问题