Mongo DB，Python：每插入10000条记录都会插入。

我必须在MongoDB中插入记录。我用了一个简单的逻辑，但没有奏效。请帮我解决这个问题。Mongo DB，Python：每插入10000条记录都会插入。

from pymongo import MongoClient 
import json 
import sys 
import os 
client = MongoClient('localhost', 9000) 
db1 = client['Com_Crawl'] 
collection1 = db1['All'] 
posts1 = collection1.posts 
ll=[] 
f=file(sys.argv[1],'r') 
for i in f: 
    j=json.loads(i) 
    ll.append(j) 
#print ll 
print len(ll) 
count = 0 
for l in ll: 
    count = count+1 
    if count <= 10000: 
     print count,l 
     print posts1.update({'vtid':l},{'$set': {'processed': 0}},upsert = True,multi = True) 
print "**** Success ***"

该文件包含1000万条记录。上面的代码插入了一个新列，并将它的值更新为“0”以记录10000条记录。但是如何能够在每批执行10000个批处理中记录其余的记录。

来源

2017-05-04 NiviSRa

不确定批处理部分，但此循环仅在'count <= 10000'时挂起，且计数从不重置。所以一旦你打10000个记录，upsert不会再发生。 – ktbiz

是的..但是我怎样才能重置下一组值的计数。 – NiviSRa

您可能只想以10000为单位递增，并在每个步骤插入一片“ll”。使用'range'而不是遍历每个元素。 – ktbiz

Mongodb有批量更新操作，它将批量更新数据库。你可以添加任何字典，并可以一次更新，但它会在内部更新1000到1000批次refer this以获得有关有序和无序批量操作的想法，并获得有关批量更新refer this的想法，以了解批量操作如何工作。所以，如果你遵循批量更新它wiil是

from pymongo import MongoClient 
client = MongoClient('localhost', 9000) 
db1 = client['Com_Crawl'] 
collection1 = db1['All'] 
posts1 = collection1.posts 
bulk = collection1.posts.initialize_unordered_bulk_op() 
ll=[] 
f=file(sys.argv[1],'r') 
for i in f: 
    j=json.loads(i) 
    ll.append(j) 
#print ll 
print len(ll) 
count = 0 
for index,l in enumerate(ll): 
    bulk.find({'vtid':l}).update({'$set': {'processed': 0}},upsert = True,multi = True) 
    if (index+1)%10000 == 0: 
     bulk.execute() #this updates the records and prints the status. 
     bulk = collection1.posts.initialize_unordered_bulk_op() #reinitialise for next set of operations. 
bulk.execute() #this updates the remaining last records.

为指向的乔d你也可以跳过散装记录和更新。

来源

2017-05-04 04:45:32 Mani

感谢玛尼。但是我遇到了AttributeError：'BulkOperationBuilder'对象没有属性'update'。 – NiviSRa

哦，好的。现在让我试试。我有个疑问;现在update（）需要upsert选项吗？因为当我在查询中使用find时，bulk.find（{'vtid'：l}）.update（{'$ set'：{'processed'：0}}，upsert = True，multi = True） TypeError ：update（）得到了一个意想不到的关键字参数'upset'' – NiviSRa

当你设置'upsert = True'时，如果找不到匹配项，它将创建新记录。如果'upsert'设置为'false'，如果没有找到匹配，它将不会执行任何操作。还请注意，在发布的评论中，我看到它在错误中被称为“不安”。请检查一下。 – Mani

你可以这样做，而不是。

for l in ll: 
    for post in posts1.find({}).skip(count*10000).limit(10000): 
     print post.update({'vtid':l},{'$set': {'processed': 0}},upsert = True,multi = True) 
    count += 1 
print "**** Success ***"

skip()究竟是干什么的，你会觉得，它跳过，在查询集的条目，然后limit()限制导致为10000。所以基本上你使用count得到的条目从0开始，10000， 20000等，并且在该起点之后仅限制10000次。

来源

2017-05-04 02:46:28

谢谢Joe D.我现在就试试。 – NiviSRa

TypeError：'集合'对象不可调用。如果您打算在“集合”对象上调用“跳过”方法，则它将失败，因为不存在此类方法。我遇到了这个错误。 @Joe D – NiviSRa

道歉，我没有看到它是一个集合，我会更新它。 @NiviSra –

Mongo DB，Python：每插入10000条记录都会插入。

回答

相关问题