2016-10-03 69 views
4

我正在编写一个简单的服务,从几个来源获取数据,将它们聚合在一起,然后使用Google API客户端将其发送到Google表格。简单的peasy工作良好,数据不是那么大。Google App Engine和Google表格超出软限制

问题是在构建api服务(即build('sheets', 'v4', http=auth).spreadsheets())之后调用.spreadsheets()会导致大约30兆字节的内存跳转(我做了一些分析以分离内存分配的位置)。当部署到GAE时,这些峰值会持续很长一段时间(有时候是几个小时),并向上蔓延,并且在多次请求触发GAE的“超出软限制内存限制”错误。

我正在使用用于发现文档和urlfetch的memcache来抓取数据,但这些是我正在使用的唯一其他服务。

我已经试过手动垃圾收集,改变app.yaml中的线程安全,甚至像改变调用.spreadsheets()的地方这样的事情,并且不能动摇这个问题。我也可能误解GAE的体系结构,但我知道这个高峰是由调用.spreadsheets()引起的,我并没有在本地缓存中存储任何东西。

是否有一种方法可以1)通过调用.spreadsheets()来减小内存峰值的大小或2)使尖峰停留在内存中(或者最好是两者兼有)。下面给出一个非常简化的要点,以说明API调用和请求处理程序的概念,如果需要,我可以提供更完整的代码。我知道以前有类似的问题,但我无法解决。

https://gist.github.com/chill17/18f1caa897e6a202aca05239

+0

事实上,我发现[问题#7973](https://code.google.com/p/googleappengine/issues/detail?id=7973)和[问题#12220](https://开头代码.google.com/p/googleappengine/issues/detail?id = 12220&can = 1&q = Exceeded%20soft%20private%20memory&colspec = ID%20Type%20Component%20Status%20Stars%20Summary%20Language%20Priority%20Owner%20Log)跟踪器与遇到的问题有关“超出软件私人内存限制”。并且根据给定的线索,这个问题还没有完全解决,并且在其中一个线索中给出的解决方法似乎也与您的担忧无关。 – Teyam

回答

0

我使用与可用的RAM的仅20MB小处理器的电子表格API时遇到了这个。问题是谷歌API客户端以字符串格式提取整个API并将其作为资源对象存储在内存中。

如果空闲内存是一个问题,您应该构建自己的http对象并手动进行所需的请求。请参阅我的Spreadsheet()类,以此作为如何使用此方法创建新电子表格的示例。

SCOPES = 'https://www.googleapis.com/auth/spreadsheets' 
CLIENT_SECRET_FILE = 'client_secret.json' 
APPLICATION_NAME = 'Google Sheets API Python Quickstart' 

class Spreadsheet: 

    def __init__(self, title): 

     #Get credentials from locally stored JSON file 
     #If file does not exist, create it 
     self.credentials = self.getCredentials() 

     #HTTP service that will be used to push/pull data 

     self.service = httplib2.Http() 
     self.service = self.credentials.authorize(self.service) 
     self.headers = {'content-type': 'application/json', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'user-agent': 'google-api-python-client/1.6.2 (gzip)'}   


     print("CREDENTIALS: "+str(self.credentials)) 


     self.baseUrl = "https://sheets.googleapis.com/v4/spreadsheets" 
     self.spreadsheetInfo = self.create(title) 
     self.spreadsheetId = self.spreadsheetInfo['spreadsheetId']  



    def getCredentials(self): 
     """Gets valid user credentials from storage. 

     If nothing has been stored, or if the stored credentials are invalid, 
     the OAuth2 flow is completed to obtain the new credentials. 

     Returns: 
      Credentials, the obtained credential. 
     """ 
     home_dir = os.path.expanduser('~') 
     credential_dir = os.path.join(home_dir, '.credentials') 
     if not os.path.exists(credential_dir): 
      os.makedirs(credential_dir) 
     credential_path = os.path.join(credential_dir, 
             'sheets.googleapis.com-python-quickstart.json') 

     store = Storage(credential_path) 
     credentials = store.get() 
     if not credentials or credentials.invalid: 
      flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES) 
      flow.user_agent = APPLICATION_NAME 
      if flags: 
       credentials = tools.run_flow(flow, store, flags) 
      else: # Needed only for compatibility with Python 2.6 
       credentials = tools.run(flow, store) 
      print('Storing credentials to ' + credential_path) 
     return credentials 

    def create(self, title): 

     #Only put title in request body... We don't need anything else for now 
     requestBody = { 
      "properties":{ 
       "title":title 
      }, 
     } 


     print("BODY: "+str(requestBody)) 
     url = self.baseUrl 

     response, content = self.service.request(url, 
             method="POST", 
             headers=self.headers, 
             body=str(requestBody)) 
     print("\n\nRESPONSE\n"+str(response)) 
     print("\n\nCONTENT\n"+str(content)) 

     return json.loads(content)