2016-03-02 109 views
0

我目前面临一个问题,使我的CVS数据字典。如何将字典作为值插入Python中使用循环的字典

我有3列,我想在文件中使用:

userID, placeID, rating 
U1000, 12222, 3 
U1000, 13333, 2 
U1001, 13333, 4 

我想作的结果是这样的:

{'U1000': {'12222': 3, '13333': 2}, 
'U1001': {'13333': 4}} 

也就是说, 我想使我的数据结构看起来像:

sample = {} 
sample["U1000"] = {} 
sample["U1001"] = {} 
sample["U1000"]["12222"] = 3 
sample["U1000"]["13333"] = 2 
sample["U1001"]["13333"] = 4 

但我有很多数据是亲cessed。 我想获得与循环的结果,但我已经尝试过了2小时,失败..

---以下代码可以迷惑你---

我的结果看现在这个样子:

{'U1000': ['12222', 3], 
'U1001': ['13333', 4]} 
  1. 该字典的值是一个列表,而一本字典
  2. 用户“U1000”出现多次,但在我孤单的结果只有一次

我想我的代码有很多错误..如果你不介意的话,请看看:

reader = np.array(pd.read_csv("rating_final.csv")) 
included_cols = [0, 1, 2] 

sample= {} 
target=[] 
target1 =[] 
for row in reader: 
     content = list(row[i] for i in included_cols) 
     target.append(content[0]) 
     target1.append(content[1:3]) 

sample = dict(zip(target, target1)) 

我怎么能提高代码? 我已经看过通过计算器,但由于个人缺乏能力, 任何人都可以请帮助我呢?

非常感谢!

+0

这似乎是你想要的字典作为_values_ ,而不是_keys_。也许正确的标题匹配? – ShadowRanger

+0

谢谢你的提醒。已更正标题以及内容! –

+0

另外,你的例子有'{'U1000':{'12222':3},{'1333':2},'U1001':{'13333':4}}',但是这是'U1000'和' U1001',但没有与{{1333':2}'相关联的键(或无值)。你可以有'{'U1000':{'12222':3,'1333':2},'U1001':{'13333':4}}'或'{'U1000':[{'12222': 3},{'1333':2}],'U1001':[{'13333':4}]}',但不是你提供的。 – ShadowRanger

回答

2

这应该做你想要什么:

import collections 

reader = ... 
sample = collections.defaultdict(dict) 

for user_id, place_id, rating in reader: 
    rating = int(rating) 
    sample[user_id][place_id] = rating 

print(sample) 
# -> {'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}} 

defaultdict是一个方便的工具,只要您试图访问一个关键,是不是在字典中提供的默认值。如果你(因为你要sample['non-existent-user-id]失败,KeyError例如)不喜欢它,使用:

reader = ... 
sample = {} 

for user_id, place_id, rating in reader: 
    rating = int(rating) 
    if user_id not in sample: 
     sample[user_id] = {} 
    sample[user_id][place_id] = rating 
+0

感谢您的澄清,这真的有帮助! –

1

例子中的预期输出是不可能的,因为{'1333': 2}不会与一个键关联。你可以得到{'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}}虽然与dictdict一个S:

sample = {} 
for row in reader: 
    userID, placeID, rating = row[:3] 
    sample.setdefault(userID, {})[placeID] = rating # Possibly int(rating)? 

或者,使用collections.defaultdict(dict)以避免涉及setdefault(或其他方法需要一个try/except KeyErrorif userID in sample:在交换牺牲setdefault的原子为不产生空dict小号不必要地):

import collections 

sample = collections.defaultdict(dict) 
for row in reader: 
    userID, placeID, rating = row[:3] 
    sample[userID][placeID] = rating 

# Optional conversion back to plain dict 
sample = dict(sample) 

转换回普通dict确保将来升ookups不会自动生动化按键,正常情况下会提升KeyError,如果您print那么它看起来像正常的dict

如果included_cols是很重要的(因为名字或列索引可能会发生变化),则可以使用operator.itemgetter加快和简化一次提取所有所需的列:

from collections import defaultdict 
from operator import itemgetter 

included_cols = (0, 1, 2) 
# If columns in data were actually: 
# rating, foo, bar, userID, placeID 
# we'd do this instead, itemgetter will handle all the rest: 
# included_cols = (3, 4, 0) 
get_cols = itemgetter(*included_cols) # Create function to get needed indices at once 

sample = defaultdict(dict) 
# map(get_cols, ...) efficiently converts each row to a tuple of just 
# the three desired values as it goes, which also lets us unpack directly 
# in the for loop, simplifying code even more by naming all variables directly 
for userID, placeID, rating in map(get_cols, reader): 
    sample[userID][placeID] = rating # Possibly int(rating)? 
+0

感谢您的回答,这真的有帮助! –