0
我正在与同事讨论关于通过Django ORM迭代大型表的问题。到现在为止我一直在使用一个queryset_iterator的实现为在这里看到:使用Django's Paginator并传递查询集成queryset_iterator和Django Paginator之间的差异
def queryset_iterator(queryset, chunksize=1000):
'''''
Iterate over a Django Queryset ordered by the primary key
This method loads a maximum of chunksize (default: 1000) rows in it's
memory at the same time while django normally would load all rows in it's
memory. Using the iterator() method only causes it to not preload all the
classes.
Note that the implementation of the iterator does not support ordered query sets.
'''
pk = 0
last_pk = queryset.order_by('-pk')[0].pk
queryset = queryset.order_by('pk')
while pk < last_pk:
for row in queryset.filter(pk__gt=pk)[:chunksize]:
pk = row.pk
yield row
gc.collect()
我的同事建议。看起来类似的工作会完成,唯一的区别是Paginator不会进行任何垃圾收集调用。
任何人都可以阐明两者之间的区别吗?有没有?