Python在列表中找到重复的字典，并用计数将它们分开

我在列表中有一个字典，有些字典是相同的。我想找到重复的，并想要添加到新的列表或字典与他们有多少重复。Python在列表中找到重复的字典，并用计数将它们分开

import itertools 

myListCombined = list() 
for a, b in itertools.combinations(myList, 2): 
    is_equal = set(a.items()) - set(b.items()) 
    if len(is_equal) == 0: 
     a.update(count=2) 
     myListCombined.append(a) 
    else: 
     a.update(count=1) 
     b.update(count=1) 
     myListCombined.append(a) 
     myListCombined.append(b) 

myListCombined = [i for n, i enumerate(myListCombine) if i not in myListCombine[n + 1:]]

此代码有点不错，但它只是在列表中的2个重复的字典。 a.update（count = 2）在这种情况下不起作用。我最后一行将它们分开后，也删除了重复的字典，但我不确定它是否能正常工作。

输入：

[{'name': 'Mary', 'age': 25, 'salary': 1000}, 
{'name': 'John', 'age': 25, 'salary': 2000}, 
{'name': 'George', 'age': 30, 'salary': 2500}, 
{'name': 'John', 'age': 25, 'salary': 2000}, 
{'name': 'John', 'age': 25, 'salary': 2000}]

所需的输出：

[{'name': 'Mary', 'age': 25, 'salary': 1000, 'count':1}, 
{'name': 'John', 'age': 25, 'salary': 2000, 'count': 3}, 
{'name': 'George', 'age': 30, 'salary': 2500, 'count' 1}]

来源

2017-08-24 Korhan Yüzbaş

请发表您的输入和期望的输出。 – Ajax1234

编辑，谢谢@ Ajax1234 –

请在下面看到我的回复。 – Ajax1234

你可以尝试以下方法，它首先每个字典转换为关键的frozenset，值元（以使它们可哈希的要求由集合。计数器）。

import collections 
a = [{'a':1}, {'a':1},{'b':2}] 
print(collections.Counter(map(lambda x: frozenset(x.items()),a)))

编辑，以反映所需的输入/输出：

from copy import deepcopy 

def count_duplicate_dicts(list_of_dicts): 
    cpy = deepcopy(list_of_dicts) 
    for d in list_of_dicts: 
     d['count'] = cpy.count(d) 
    return list_of_dicts 

x = [{'a':1},{'a':1}, {'c':3}] 
print(count_duplicate_dicts(x))

来源

2017-08-24 18:38:05 Solaxun

我stucked当我使用collections.Counter作为字典不可散列。谢谢你的帮助！所以，因为冷冻集不是可代换的，我应该使用'dict（frozenset）['salary']'来达到值吗？ –

可以使用collections.Counter采取的计数值，然后加入从Counter每个frozenset计数值后重建http://stardict.sourceforge.net/Dictionaries.php下载：

from collections import Counter 

l = [dict(d | {('count', c)}) for d, c in Counter(frozenset(d.items()) 
                for d in myList).items()] 
print(l) 
# [{'salary': 1000, 'name': 'Mary', 'age': 25, 'count': 1}, 
# {'name': 'John', 'salary': 2000, 'age': 25, 'count': 3}, 
# {'salary': 2500, 'name': 'George', 'age': 30, 'count': 1}]

来源

2017-08-24 18:58:31

如果你的词典数据结构良好，而且词典的内容是简单的数据类型，例如数字和字符串，并且您有以下数据分析处理，我建议您使用提供丰富功能的熊猫。这里是您的情况下的示例代码：

In [32]: data = [{'name': 'Mary', 'age': 25, 'salary': 1000}, 
    ...: {'name': 'John', 'age': 25, 'salary': 2000}, 
    ...: {'name': 'George', 'age': 30, 'salary': 2500}, 
    ...: {'name': 'John', 'age': 25, 'salary': 2000}, 
    ...: {'name': 'John', 'age': 25, 'salary': 2000}] 
    ...: 
    ...: df = pd.DataFrame(data) 
    ...: df['counts'] = 1 
    ...: df = df.groupby(df.columns.tolist()[:-1]).sum().reset_index(drop=False) 
    ...: 

In [33]: df 
Out[33]: 
    age name salary counts 
0 25 John 2000  3 
1 25 Mary 1000  1 
2 30 George 2500  1 

In [34]: df.to_dict(orient='records') 
Out[34]: 
[{'age': 25, 'counts': 3, 'name': 'John', 'salary': 2000}, 
{'age': 25, 'counts': 1, 'name': 'Mary', 'salary': 1000}, 
{'age': 30, 'counts': 1, 'name': 'George', 'salary': 2500}]

的逻辑是：

（1）首先从数据建立数据帧

（2）GROUPBY功能可以做在每个聚合函数组。

（3）输出回到快译通，你可以叫pd.to_dict

大熊猫是个大包，花费一些时间来学习它，但它实在值得了解大熊猫。它非常强大，可以使您的数据分析更加快速和优雅。

谢谢。

来源

2017-08-24 19:04:04 rojeeer

你可以试试这个：

import collections 

d = [{'name': 'Mary', 'age': 25, 'salary': 1000}, 
{'name': 'John', 'age': 25, 'salary': 2000}, 
{'name': 'George', 'age': 30, 'salary': 2500}, 
{'name': 'John', 'age': 25, 'salary': 2000}, 
{'name': 'John', 'age': 25, 'salary': 2000}] 

count = dict(collections.Counter([i["name"] for i in d])) 
a = list(set(map(tuple, [i.items() for i in d]))) 
final_dict = [dict(list(i)+[("count", count[dict(i)["name"]])]) for i in a]

输出：

[{'salary': 2000, 'count': 3, 'age': 25, 'name': 'John'}, {'salary': 2500, 'count': 1, 'age': 30, 'name': 'George'}, {'salary': 1000, 'count': 1, 'age': 25, 'name': 'Mary'}]

来源

2017-08-24 19:07:09 Ajax1234

Python在列表中找到重复的字典，并用计数将它们分开

回答

相关问题