3

我需要使用PyMongo构造一个查询,它从MongoDB数据库中的两个相关集合获取数据。使用PyMongo,我需要获取另一个集合的字段

收集X有字段用户ID,姓名,及EMAILID:

[ 
    { 
    "UserId" : "941AB", 
    "Name" :  "Alex Andresson", 
    "EmailId" : "[email protected]" 
    }, 
    { 
    "UserId" : "768CD", 
    "Name" :  "Bryan Barnes", 
    "EmailId" : "[email protected]" 
    } 
] 

集合Y的领域UserId1,UserID2和评级:

[ 
    { 
    "UserId1" : "941AB", 
    "UserId2" : "768CD", 
    "Rating" : 0.8 
    } 
] 

我需要打印的姓名和电子邮件UserId1和UserId2的ID以及评级,如下所示:

[ 
    { 
    "UserId1" : "941AB", 
    "UserName1" : "Alex Andresson" 
    "UserEmail1" : "[email protected]", 
    "UserId2" : "768CD", 
    "UserName2" : "Bryan Barnes" 
    "UserEmail2" : "[email protected]", 
    "Rating":  0.8 
    } 
] 

这意味着我需要从集合Y以及X中获取数据。我现在正在与PyMongo合作,而且我一直无法找到它的解决方案。有人甚至可以给我一个关于这个概念的伪代码或者如何推进它的方法。

回答

0

您需要手动执行加入或使用某个库,它会为您执行 - 也许是mongoengine

基本上你需要找到你感兴趣的评分,然后找到与那些评分相关的用户。

例子:

#!/usr/bin/env python3 

import pymongo 
from random import randrange 

client = pymongo.MongoClient() 
db = client['test'] 

# clean collections 
db['users'].drop() 
db['ratings'].drop() 

# insert data 
user_count = 100 
rating_count = 20 

db['users'].insert_many([ 
    {'UserId': i, 'Name': 'John', 'EmailId': i} 
    for i in range(user_count)]) 

db['ratings'].insert_many([ 
    {'UserId1': randrange(user_count), 'UserId2': randrange(user_count), 'Rating': i} 
    for i in range(rating_count)]) 

# don't forget the indexes 
db['users'].create_index('UserId') 
# but it would be better if we used _id as the UserId 

# if you want to make queries based on Rating value, then add also this index: 
db['ratings'].create_index('Rating') 

# now print ratings with users that have value 10+ 

# simple approach: 
ratings = db['ratings'].find({'Rating': {'$gte': 10}}) 
for rating in ratings: 
    u1 = db['users'].find_one({'UserId': rating['UserId1']}) 
    u2 = db['users'].find_one({'UserId': rating['UserId2']}) 
    print('Rating between {} (UserId {:2}) and {} (UserId {:2}) is {:2}'.format(
     u1['Name'], u1['UserId'], u2['Name'], u2['UserId'], rating['Rating'])) 

print('---') 

# optimized approach: 
ratings = list(db['ratings'].find({'Rating': {'$gte': 10}})) 
user_ids = {r['UserId1'] for r in ratings} 
user_ids |= {r['UserId2'] for r in ratings} 
users = db['users'].find({'UserId': {'$in': list(user_ids)}}) 
users_by_id = {u['UserId']: u for u in users} 
for rating in ratings: 
    u1 = users_by_id.get(rating['UserId1']) 
    u2 = users_by_id.get(rating['UserId2']) 
    print('Rating between {} (UserId {:2}) and {} (UserId {:2}) is {:2}'.format(
     u1['Name'], u1['UserId'], u2['Name'], u2['UserId'], rating['Rating'])) 

注意,第一个方法调用一个find的收视率和每等级2个find S,但第二种方法中调用总数只有三个find秒。如果您通过网络访问MongoDB,这将导致巨大的性能差异。

如果可能的话,我建议使用_id而不是UserId用户集合。

当然,使用SQL数据库这个特殊用例会容易得多。如果您正在使用MongoDB进行性能测试,并且读取次数多于写入次数,请考虑将相关用户名缓存到评级文档中。

相关问题