更快，更'pythonic'字典列表

为了简单起见，我在列表中提供了2个列表，但实际上我处理了列表中的数百个列表，每个列表包含大量的字典。我只想在第一个字典中获取'status'键的值，而不检查该列表中的任何其他字典（因为我知道它们在该键中都包含相同的值）。然后我会在每个大词典中执行某种聚类。我需要有效地连接所有'标题'值。有没有办法让我的代码更优雅，更快？更快，更'pythonic'字典列表

我：

nested = [ 
    [ 
     {'id': 287, 'title': 'hungry badger', 'status': 'High'}, 
     {'id': 437, 'title': 'roadtrip to Kansas','status': 'High'} 
    ], 
    [ 
     {'id': 456, 'title': 'happy title here','status': 'Medium'}, 
     {'id': 342,'title': 'soft big bear','status': 'Medium'} 
    ] 
]

我想：

result = [ 
    { 
     'High': [ 
      {'id': 287, 'title': 'hungry badger'}, 
      {'id': 437, 'title': 'roadtrip to Kansas'} 
     ] 
    }, 
    { 
     'Medium': [ 
      {'id': 456, 'title': 'happy title here'}, 
      {'id': 342, 'title': 'soft big bear'} 
     ] 
    } 
]

我试了一下：

for oneList in nested: 
    result= {} 
    for i in oneList:   
     a= list(i.keys()) 
     m= [i[key] for key in a if key not in ['id','title']] 
     result[m[0]]=oneList 
     for key in a: 
      if key not in ['id','title']: 
       del i[key]

来源

2016-09-17 el347

from itertools import groupby  
result = groupby(sum(nested,[]), lambda x: x['status'])

工作原理：

sum(nested,[])会连接所有外部列表连成的一个词典大名单

groupby(, lambda x: x['status'])组所有对象通过他们的状态属性

注意，所以如果你想itertools.groupby返回一个生成器（不是列表），物化发电机，你需要做如下的事情。

from itertools import groupby  
result = groupby(sum(nested,[]), lambda x: x['status']) 
result = {key:list(val) for key,val in result}

来源

2016-09-17 00:09:16 gnicholas

OMG！ @。@ 哇。你的解决方案如此之快。非常感谢你！！！！完美的作品。 – el347

一：不要使用sum（nested，[]）'。这是平缓的最慢的方法，并且越平缓越慢（它创建'n'临时'列表'，每次越来越多）。你已经在使用'itertools'了，你正在迭代结果（根本不需要一个真正的'list'），所以只需使用'itertools.chain.from_iterable'来平坦化（因为'lambda'是邪恶的/ slow时不需要，''operator.itemgetter'为'key'）：'groupby（chain.from_iterable（嵌套），itemgetter（'status'））'。 ['sum（x，[]）'是_slow_（见注释）]（http://stackoverflow.com/a/39520827/364696）。 – ShadowRanger

@ShadowRanger谢谢你！刚刚运行这个：从itertools导入链; 进口经营者; s = groupby（chain.from_iterable（results），key = operator.itemgetter（'status'））; for key，grp in s：print（key，list（grp））一切都很好。 – el347

你可以为每个嵌套列表defaultdict：

import collections 
nested = [ 
[{'id': 287, 'title': 'hungry badger', 'status': 'High'}, 
{'id': 437, 'title': 'roadtrip to Kansas','status': 'High'}],  
[{'id': 456, 'title': 'happy title here','status': 'Medium'}, 
{'id': 342,'title': 'soft big bear','status': 'Medium'}] ] 
result = [] 
for l in nested: 
    r = collections.defaultdict(list) 
    for d in l: 
     name = d.pop('status') 
     r[name].append(d) 
    result.append(r)

这给出了以下result：

>>> import pprint 
>>> pprint.pprint(result) 
[{'High': [{'id': 287, 'title': 'hungry badger'}, 
      {'id': 437, 'title': 'roadtrip to Kansas'}]}, 
{'Medium': [{'id': 456, 'title': 'happy title here'}, 
      {'id': 342, 'title': 'soft big bear'}]}]

来源

2016-09-17 00:20:18 TigerhawkT3

太棒了。 TNX！每天在这里学习新的东西。呵呵。 itertools的groupby解决方案看起来非常好;降低复杂性。你的回答教会了我关于collections.defaultdict（）。再次感谢。 – el347

哦，真好！ Ur解决方案消除了状态，并做我在问什么。我会把这两个时间都用于这些，并且还会用itertools的groupby来玩更多...感谢帮助，man！ – el347

更快，更'pythonic'字典列表

回答

相关问题