在Python 3.6中使用setdefault使用来自两个不同文件的信息显示（名称，ID和频率计数）

我试图读取两个.dat文件并创建一个程序，该程序使用aid2name的值作为字典中的键它具有aid2numplays的关键和价值，并被设定为它的价值观。这一切都是为了让文件产生一个包含（艺术家姓名，艺术家ID，剧本频率）的结果。值得注意的是，第一个文件提供了艺术家姓名和艺术家ID，而第二个文件提供了每个用户的用户ID，艺术家ID和频率。任何想法如何聚合这些频率的用户，然后显示他们（艺术家的名字，艺术家ID，播放频率）格式？下面是我迄今管理：在Python 3.6中使用setdefault使用来自两个不同文件的信息显示（名称，ID和频率计数）

import codecs 
aid2name = {} 
d2 = {} 
fp = codecs.open("artists.dat", encoding = "utf-8") 
fp.readline() #skip first line of headers 
for line in fp: 
    line = line.strip() 
    fields = line.split('\t') 
    aid = int(fields[0]) 
    name = fields[1] 
    aid2name = {int(aid), name} 
    d2.setdefault(fields[1], {}) 
    #print (aid2name) 
# do other processing 
    #print(dictionary) 

aid2numplays = {} 
fp = codecs.open("user_artists.dat", encoding = "utf-8") 
fp.readline() #skip first line of headers 
for line in fp: 
    line = line.strip() 
    fields = line.split('\t') 
    uid = int(fields[0]) 
    aid = int(fields[1]) 
    weight = int(fields[2]) 
    aid2numplays = [int(aid), int(weight)] 
    #print(aid2numplays) 
    #print(uid, aid, weight) 

for (d2.fields[1], value) in d2: 
    group = d2.setdefault(d2.fields[1], {}) # key might exist already 
    group.append(aid2numplays) 

print(group)

来源

2017-04-11 pythonuser890

这可能有助于看到最终的数据结构应该是什么样子，我不能确定你打算如何使用[setdefault（HTTP的例子：/ /stackoverflow.com/questions/3483520/use-cases-for-the-setdefault-dict-method） – brennan

编辑：关于使用setdefault，如果你想组由artistID用户数据，那么你可以：

grouped_data = {} 
for u in users: 
    k, v = u[1], {'userID': u[0], 'weight': u[2]} 
    grouped_data.setdefault(k, []).append(v)

这主要是同写：

grouped_data = {} 
for u in users: 
    k, v = u[1], {'userID': u[0], 'weight': u[2]} 
    if k in grouped_data: 
     grouped_data[k].append(v) 
    else: 
     grouped_data[k] = [v]

至于怎么算的艺术家出现在不同的用户数据的次数的例子，你ç乌尔德 - 将数据读入列表的列表：

with codecs.open("artists.dat", encoding = "utf-8") as f: 
    artists = f.readlines() 

with codecs.open("user_artists.dat", encoding = "utf-8") as f: 
    users = f.readlines() 

artists = [x.strip().split('\t') for x in artists][1:] # [['1', 'MALICE MIZER', .. 
users = [x.strip().split('\t') for x in users][1:] # [['2', '51', '13883'], ..]

迭代艺术家使用artistID作为重点打造的字典。添加游戏统计信息的占位符。

data = {} 
for a in artists: 
    artistID, name = a[0], a[1] 
    data[artistID] = {'name': name, 'plays': 0}

遍历用户更新字典与每个行：

for u in users: 
    artistID = u[1] 
    data[artistID]['plays'] += 1

输出，用于数据：

{'1': {'name': 'MALICE MIZER', 'plays': 3}, 
'2': {'name': 'Diary of Dreams', 'plays': 12}, 
'3': {'name': 'Carpathian Forest', 'plays': 3}, ..}

编辑：遍历用户数据，并创建一个所有与用户相关的艺术家字典，我们可以：

artist_list = [x.strip().split('\t') for x in artists][1:] 
user_stats_list = [x.strip().split('\t') for x in users][1:] 

artists = {} 
for a in artist_list: 
    artistID, name = a[0], a[1] 
    artists[artistID] = name 

grouped_user_stats = {} 
for u in user_stats_list: 
    userID, artistID, weight = u 
    if userID not in grouped_user_stats: 
     grouped_user_stats[userID] = { artistID: {'name': artists[artistID], 'plays': 1} } 
    else: 
     if artistID not in grouped_user_stats[userID]: 
      grouped_user_stats[userID][artistID] = {'name': artists[artistID], 'plays': 1} 
     else: 
      grouped_user_stats[userID][artistID]['plays'] += 1 
      print('this never happens') 
      # it looks the same artist is never listed twice for the same user

输出：

{'2': {'100': {'name': 'ABC', 'plays': 1}, 
     '51': {'name': 'Duran Duran', 'plays': 1}, 
     '52': {'name': 'Morcheeba', 'plays': 1}, 
     '53': {'name': 'Air', 'plays': 1}, .. }, 
.. 
}

来源

2017-04-11 19:38:00 brennan

要最终在（艺术家名称，艺术家ID，剧本频率）格式中显示它们：'[{'id'： k，** v}表示k，v表示data.items（）]或'[（k，** v）表示k，v表示data.items（）]'表示字典各自的元组列表。 – mab

谢谢你们两位。我一直从不同的角度看待它，但我觉得现在就点击它。布伦，出于好奇，你怎么知道输出没有文件的细节？ – pythonuser890

此外，我想知道如何聚合每位艺术家的总剧本？我一直在努力通过用户ID来添加游戏，然后将它们显示为聚合。任何提示，建议都非常有帮助。非常感谢您提供这些提示。 – pythonuser890

在Python 3.6中使用setdefault使用来自两个不同文件的信息显示（名称，ID和频率计数）

回答

相关问题