2016-01-06 94 views
1

我想写一个简单的脚本,将csv作为输入,并将其写入单个电子表格文档。现在我有它的工作,但脚本很慢。大约需要10分钟才能在两张工作表中书写cca 350行。我可以在Python Spreadsheets中使用gspread在Python中编写整行代码吗?

这里是脚本我有:

#!/usr/bin/python 
import json, sys 
import gspread 
from oauth2client.client import SignedJwtAssertionCredentials 

json_key = json.load(open('client_secrets.json')) 
scope = ['https://spreadsheets.google.com/feeds'] 

# change to True to see Debug messages 
DEBUG = False 

def updateSheet(csv,sheet): 
    linelen = 0 
    counter1 = 1 # starting column in spreadsheet: A 
    counter2 = 1 # starting row in spreadsheet: 1 
    counter3 = 0 # helper for iterating through line entries 
    credentials = SignedJwtAssertionCredentials(json_key['client_email'], json_key['private_key'], scope) 

    gc = gspread.authorize(credentials) 

    wks = gc.open("Test Spreadsheet") 
    worksheet = wks.get_worksheet(sheet) 
    if worksheet is None: 
     if sheet == 0: 
      worksheet = wks.add_worksheet("First Sheet",1,8) 
     elif sheet == 1: 
      worksheet = wks.add_worksheet("Second Sheet",1,8) 
     else: 
      print "Error: spreadsheet does not exist" 
      sys.exit(1) 

    worksheet.resize(1,8) 

    for i in csv: 
     line = i.split(",") 
     linelen = len(line)-1 
     if (counter3 > linelen): 
      counter3 = 0 
     if (counter1 > linelen): 
      counter1 = 1 

     if (DEBUG): 
      print "entry length (starting from 0): ", linelen 
      print "line: ", line 
      print "counter1: ", counter1 
      print "counter3: ", counter3 
     while (counter3<=linelen): 
      if (DEBUG): 
       print "writing line: ", line[counter3] 
      worksheet.update_cell(counter2, counter1, line[counter3].rstrip('\n')) 
      counter3 += 1 
      counter1 += 1 

     counter2 += 1 
     worksheet.resize(counter2,8) 

我的系统管理员,所以我提前为低劣的代码道歉。

无论如何,脚本将从csv中逐行读取,按逗号分隔并逐个写入,因此编写它需要一些时间。这个想法是让cron每天执行一次,它会删除旧的条目并写入新的条目 - 这就是为什么我使用resize()。

现在,我想知道是否有一个更好的方法来获取整个csv行,并将其写入每个值在它自己的单元格的工作表中,避免像现在一样写入单元格?这将显着减少执行它所需的时间。

谢谢!

回答

2

是的,这可以做到。我上传了100行12行的数据块,它处理得很好 - 我不确定这个比例如何,但是对于像一个整体csv一样的东西。另外请注意,工作表的默认长度为1000行,如果您尝试引用此范围之外的行(因此请事先使用add_rows以确保空间有限),您将收到错误消息。简单的例子:

data_to_upload = [[1, 2], [3, 4]] 

column_names = ['','A','B','C','D','E','F','G','H', 'I','J','K','L','M','N', 
       'O','P','Q','R','S','T','U','V','W','X','Y','Z', 'AA'] 

# To make it dynamic, assuming that all rows contain same number of elements 
cell_range = 'A1:' + str(column_names[len(data_to_upload[0])]) + str(len(data_to_upload)) 

cells = worksheet.range(cell_range) 

# Flatten the nested list. 'Cells' will not by default accept xy indexing. 
flattened_data = flatten(data_to_upload) 

# Go based on the length of flattened_data, not cells. 
# This is because if you chunk large data into blocks, all excess cells will take an empty value 
# Doing the other way around will get an index out of range 
for x in range(len(flattened_data)): 
    cells[x].value = flattened_data[x].decode('utf-8') 

worksheet.update_cells(cells) 

如果行的长度不同的那么显然你需要插入空字符串的适当数量为cells,以确保这两个名单没有得到同步。为了方便起见,我使用了解码,因为我一直使用特殊字符崩溃,因此似乎最好只是将其放入。

相关问题