2017-08-01 437 views
0

我使用Python 2.7,Peewee和MySQL。我的程序从csv文件读取并更新字段,如果订单号码存在于csv中。可以有2000-3000次更新,我正在使用天真的方法逐一更新记录,这种方法速度很慢。我已经从使用Peewee更新转移到原始查询,这有点快。但是,它仍然非常缓慢。我想知道如何在不使用循环的情况下以更少的事务更新记录。Python Peewee MySQL批量更新

def mark_as_uploaded_to_zoho(self, which_file): 
    print "->Started marking the order as uploaded to zoho." 
    with open(which_file, 'rb') as file: 
     reader = csv.reader(file, encoding='utf-8') 
     next(reader, None) ## skipping the header 

     for r in reader: 
      order_no = r[0] 
      query = '''UPDATE sales SET UploadedToZoho=1 WHERE OrderNumber="%s" and UploadedToZoho=0''' %order_no 
      SalesOrderLine.raw(query).execute() 

    print "->Marked as uploaded to zoho." 

回答

0

您可以使用insert_many来限制交易次数并大幅提升速度。这需要一个迭代器,它返回模型字段与字典键匹配的字典对象。

根据您尝试插入的记录数量,您可以一次完成所有记录,也可以将其分成更小的块。在过去,我一次插入了超过10,000条记录,但根据数据库服务器和客户端规格,这可能会非常缓慢,所以我将以两种方式展示。

with open(which_file, 'rb') as file: 
    reader = csv.DictReader(file) 
    SalesOrderLine.insert_many(reader) 

OR

# Calls a function with chunks of an iterable as list. 
# Not memory efficient at all. 
def chunkify(func, iterable, chunk_size): 
    chunk = [] 
    for o in iterable: 
     chunk.append(o) 
     if len(chunk) > chunk_size: 
      func(chunk) 
      chunk = [] 

with open(which_file, 'rb') as file: 
    reader = csv.DictReader(file) 
    chunkify(SalesOrderLine.insert_many, reader, 1000) 

为了更有效的方式来 “chunkify” 迭代器,结账this question

通过简单地使用with db.atomic可以获得额外的加速,如概述here