2017-02-18 71 views
1

我有一个包含列表的字典。例如,按每个子列表中的某些值对子列表分组列表

{1: [[sender11, receiver11, text11, address11]], 
2: [[sender21, receiver21, text21, address21], [sender22, receiver22, text22, address22]], 
3: [[sender31, receiver31, text31, address31], [sender32, receiver32, text32, address32], [sender33, receiver33, text33, address33]] 
4: [[sender41, receiver41, text41, address41], [sender42, receiver42, text42, address42], [sender43, receiver43, text43, address43], [sender44, receiver44, text44, address44]]} 

我想要做的是,对于那些包含有2个或多个元素(即dict[2],在这个例子中dict[3]dict[4])的列表字典元素,我做的sender, receiver, text的每一个的比较列表值。对于每组列表值相同的sender, receiver, text,我会做一些事情。

因此,例如,在dict[3],如果sender31, receiver31, text31sender32, receiver32, text32sender33, receiver33, text33相同的值,然后我会做所有的3个列表值的东西。

说,在dict[4],如果sender41, receiver41, text41是相同的值sender42, receiver42, text42,而sender43, receiver43, text43来自sender41, receiver41, text41相同的值sender44, receiver44, text44,但不同的,然后我会在这2组独立工作。

我写了一个Python脚本,几乎蛮力比较的sender21, receiver21, text21sender22, receiver22, text22的值,即

if sender21 == sender22 and receiver21 == receiver22 and text21 == text22: 
    # Do something 

这是不是有效,因为它仅适用于2个列表值,但我不知道我应该如何实现这使得它适用于任何号码表的值大于1

回答

1

我觉得defaultdict是去这里明显的方式:

from collections import defaultdict 

def collate(seq): 
    groups = defaultdict(list) 
    for subseq in seq: 
     groups[tuple(subseq[:3])].append(subseq[3]) 
    return groups 

根据您的实际数据,您可能会用上述功能替换上述功能中的tuple(subseq[:3])(subseq[1], subseq[4], subseq[5]),或附加subseq[3]subseq本身......这将取决于你在做什么与数据。

但是,键必须是元组而不是列表,因为键必须是不可变的。

例子:

>>> data = [ 
...  ['S1', 'R1', 'T1', 'A3'], 
...  ['S2', 'R2', 'T2', 'A4'], 
...  ['S1', 'R1', 'T1', 'A5'], 
...  ['S2', 'R2', 'T2', 'A6'] 
... ] 

>>> collate(data) 
defaultdict(<type 'list'>, { 
    ('S2', 'R2', 'T2'): ['A4', 'A6'], 
    ('S1', 'R1', 'T1'): ['A3', 'A5'] 
}) 

你可以用这个工作就像你的任何其他词典,例如

>>> for (sender, receiver, text), addresses in collate(data).items(): 
...  print sender, receiver, text 
...  print '|'.join(addresses) 
...  print 
... 
S2 R2 T2 
A4|A6 

S1 R1 T1 
A3|A5 
  
+0

谢谢!这很好。然而,如果现在,我想'发送者,接收者,文本'和'(接收者,发送者,文本)'在同一个组中,而不是完全匹配'(发送者,接收者,文本)',即发件人/收件人的顺序无关紧要?这可能吗?我需要散列它吗? – Rayne

+1

集合类型是a)不可变的,b)不关心顺序是'frozenset',所以像'groups [frozenset(subseq [:2]),subseq [2]]。append(subseq [3] )'听起来很正确 - 必要时调整。 –

+0

顺便说一句,这是你应该知道的或者能够从文档中快速找到以成为有效的程序员的东西。一遍又一遍读取https://docs.python.org/2/library/stdtypes.html,直到您知道为止*标准类型将长期大量偿还您的努力。 –