2010-09-29 87 views
0

我正在创建一个寻呼机,该寻呼机从python-couchdb的Apache CouchDB映射函数返回文档。这个生成器表达式运行良好,直到达到最大递归深度。如何才能改进以迭代,而不是递归?python-couchdb寻呼机触发递归深度限制

def page(db, view_name, limit, include_docs=True, **opts): 
    """ 
    `page` goes returns all documents of CouchDB map functions. It accepts 
    all options that `couchdb.Database.view` does, however `include_docs` 
    should be omitted, because this will interfere with things. 

    >>> import couchdb 
    >>> db = couchdb.Server()['database'] 
    >>> for doc in page(db, '_all_docs', 100): 
    >>> doc 
    #etc etc 
    >>> del db['database'] 

    Notes on implementation: 
     - `last_doc` is assigned on every loop, because there doesn't seem to 
     be an easy way to know if something is the last item in the iteration. 
    """ 

    last_doc = None 
    for row in db.view(view_name, 
        limit=limit+1, 
        include_docs=include_docs, 
        **opts): 
     last_doc = row.key, row.id 
     yield row.doc 
    if last_doc: 
     for doc in page(db, view_name, limit, 
       inc_docs=inc_docs, 
       startkey=last_doc[0], 
       startkey_docid=last_doc[1]): 
      yield doc 
+0

我看不懂这段代码。我不是PEP8鹦鹉的粉丝,但请至少使用* 4空格缩进。 – 2010-09-29 23:55:02

+0

这并没有真正回答这个问题,但有用的说明是,您可以通过使用'sys.setrecursionlimit()' – 2010-09-30 00:15:48

+0

来更改最大递归深度。感谢@Rafe,我知道,但是因为我返回了几十万行,我不想杀死电脑。 – 2010-09-30 00:26:04

回答

0

这里有一些让你开始。你没有指定什么*opts可能是;如果你只需要startkey和startkey_docid来启动递归,而不需要其他字段,那么你可以去除额外的功能。

很明显,未经测试。

def page_key(db, view_name, limit, startkey, startkey_docid, inc_docs=True): 
    queue = [(startkey, startkey_docid)] 
    while queue: 
     key = queue.pop() 

     last_doc = None 
     for row in db.view(view_name, 
          limit=limit+1, 
          include_docs=inc_docs, 
          startkey=key[0], 
          startkey_docid=key[1]): 
      last_doc = row.key, row.id 
      yield row.doc 

     if last_doc: 
      queue.append(last_doc) 

def page(db, view_name, limit, inc_docs=True, **opts): 
    last_doc = None 
    for row in db.view(view_name, 
         limit=limit+1, 
         include_docs=inc_docs, 
         **opts): 
     last_doc = row.key, row.id 
     yield row.doc 

    if last_doc: 
     for doc in page_key(db, view_name, limit, last_doc[0], last_doc[1], inc_docs): 
      yield doc 
0

这与> 800K文档的数据库上,我已经测试(手动地)的替代方法。似乎工作。

def page2(db, view_name, limit, inc_docs=True, **opts): 
    def get_batch(db=db, view_name=view_name, limit=limit, inc_docs=inc_docs, **opts): 
     for row in db.view(view_name, limit=limit+1, include_docs=inc_docs, **opts): 
      yield row 
    last_doc = None 
    total_rows = db.view(view_name, limit=1).total_rows 
    batches = (total_rows/limit) + 1 
    for i in xrange(batches): 
     if not last_doc: 
      for row in get_batch(): 
       last_doc = row.key, row.id 
       yield row.doc or row # if include_docs is False, 
             # row.doc will be None 
     else: 
      for row in get_batch(startkey=last_doc[0], 
          startkey_docid=last_doc[1]): 
       last_doc = row.key, row.id 
       yield row.doc or row 
0

我不使用CouchDB,所以我在理解示例代码时遇到了一些问题。这里是一个精简版,我相信工作你想要的方式:

all_docs = range(0, 100) 

def view(limit, offset): 
    print "view: returning", limit, "rows starting at", offset 
    return all_docs[offset:offset+limit] 

def generate_by_pages(page_size): 
    offset = 0 
    while True: 
     rowcount = 0 
     for row in generate_page(page_size, offset): 
      rowcount += 1 
      yield row 
     if rowcount == 0: 
      break 
     else: 
      offset += rowcount 

def generate_page(page_size, offset): 
    for row in view(page_size, offset): 
     yield row 

for r in generate_by_pages(10): 
    print r 

的关键是用迭代替换递归。有很多方法可以做到这一点(我喜欢Python中的蹦床),但上述内容很简单。