应用程序引擎上的故障安全数据存储更新

应用程序引擎数据存储当然有downtime。但是，我希望有一个“故障安全”put，它在面对数据存储错误时更加稳健（请参阅下面的动机）。当数据存储不可用时，似乎任务队列是延迟写入的明显位置。我不知道任何其他解决方案（除了通过urlfetch将数据发送给第三方）。应用程序引擎上的故障安全数据存储更新

动机：我真的需要要被放置在数据存储实体 - 只是显示错误信息给用户不会做。例如，也许有一些副作用发生，不容易撤销（可能与第三方网站有一些互动）。

我想出了一个简单的包装（我认为）提供了一个合理的“故障安全”放（见下文）。你有没有看到这方面的问题，或有一个更强大的实施的想法？（注：由于张贴在由尼克·约翰逊和萨克森德鲁斯答案的建议，这个帖子被编辑了一些改进的代码）的任务

import logging 
from google.appengine.api.labs.taskqueue import taskqueue 
from google.appengine.datastore import entity_pb 
from google.appengine.ext import db 
from google.appengine.runtime.apiproxy_errors import CapabilityDisabledError 

def put_failsafe(e, db_put_deadline=20, retry_countdown=60, queue_name='default'): 
    """Tries to e.put(). On success, 1 is returned. If this raises a db.Error 
    or CapabilityDisabledError, then a task will be enqueued to try to put the 
    entity (the task will execute after retry_countdown seconds) and 2 will be 
    returned. If the task cannot be enqueued, then 0 will be returned. Thus a 
    falsey value is only returned on complete failure. 

    Note that since the taskqueue payloads are limited to 10kB, if the protobuf 
    representing e is larger than 10kB then the put will be unable to be 
    deferred to the taskqueue. 

    If a put is deferred to the taskqueue, then it won't necessarily be 
    completed as soon as the datastore is back up. Thus it is possible that 
    e.put() will occur *after* other, later puts when 1 is returned. 

    Ensure e's model is imported in the code which defines the task which tries 
    to re-put e (so that e can be deserialized). 
    """ 
    try: 
     e.put(rpc=db.create_rpc(deadline=db_put_deadline)) 
     return 1 
    except (db.Error, CapabilityDisabledError), ex1: 
     try: 
      taskqueue.add(queue_name=queue_name, 
          countdown=retry_countdown, 
          url='/task/retry_put', 
          payload=db.model_to_protobuf(e).Encode()) 
      logging.info('failed to put to db now, but deferred put to the taskqueue e=%s ex=%s' % (e, ex1)) 
      return 2 
     except (taskqueue.Error, CapabilityDisabledError), ex2: 
      return 0

请求处理程序：

from google.appengine.ext import db, webapp 

# IMPORTANT: This task deserializes entity protobufs. To ensure that this is 
#   successful, you must import any db.Model that may need to be 
#   deserialized here (otherwise this task may raise a KindError). 

class RetryPut(webapp.RequestHandler): 
    def post(self): 
     e = db.model_from_protobuf(entity_pb.EntityProto(self.request.body)) 
     e.put() # failure will raise an exception => the task to be retried

我不要期待这个用于每把 - 大部分时间，显示错误信息就好了。对于每一个放置都使用它是很有诱惑力的，但是我认为有时如果我告诉他们他们的改变将在以后出现（并且继续向他们显示旧数据直到数据存储备份和延期卖出执行）。

来源

2010-09-26 David Underhill

一个相关的问题：是否有数据存储和任务队列停机之间有任何相关性？（http://stackoverflow.com/questions/3800252/datastore-and-task-queue-downtime-correlation） – 2010-09-26 23:25:45

你的做法是合理的，但有几个注意事项：

默认情况下，放置操作将重试，直到它运行的时间。由于您有备份策略，因此您可能希望尽快放弃 - 在这种情况下，您应该为put方法调用提供一个rpc参数，并指定一个自定义的截止日期。
没有必要设置明确的倒计时 - 任务队列将以增加的时间间隔重试失败的操作。
你不需要使用pickle - 协议缓冲区有一个自然的字符串编码，效率更高。有关如何使用它的演示，请参见this post。如Saxon指出的，任务队列有效载荷被限制为10千字节，因此您可能在大型实体中遇到问题。
最重要的是，这将数据存储一致性模型从“强一致性”更改为“最终一致性”。也就是说，您入选任务队列的投入可以在将来的任何时间应用，覆盖在此期间所做的任何更改。任何数量的竞争条件都是可能的，如果在任务队列中存在悬而未决的情况，则实质上使得事务无用。

来源

2010-09-28 10:09:16

感谢您的详细反馈意见;我肯定会纳入这些想法。我设定倒计时的唯一原因是因为我认为这将确保任务队列不会立即尝试重新放置实体（因为它只是失败了，也许应该给它一小段时间[也许是如果问题是暂时性的，例如平板分割等，则默认60秒太多]）。 – 2010-09-28 10:16:21

一个潜在的问题是tasks are limited to 10kb of data，所以如果你有一个大于一次酸洗的实体，这将不起作用。

来源

2010-09-28 06:01:55

好点;幸运的是，我不必为这个实体使用它而担心这个问题。但是我会更新代码的文档字符串以反映这个限制。 – 2010-09-28 10:13:25

应用程序引擎上的故障安全数据存储更新

回答

相关问题