2009-01-03 58 views
32
分组嵌套列表

我有以下数据结构(列表的列表)排序和在Python

[ 
['4', '21', '1', '14', '2008-10-24 15:42:58'], 
['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
['5', '21', '3', '19', '2008-10-24 15:45:45'], 
['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
['7', '22', '3', '2somename', '2008-10-24 15:45:51'] 
] 

我希望能够

  1. 使用重新排序列表功能以便我可以按列表中的每个项目进行分组。例如,我希望能够按第二列进行分组(所有21都在一起)

  2. 使用函数仅显示每个内部列表中的某些值。例如,我想,以减少该列表只包含“2somename”第四届字段值

所以列表看起来像这样

[ 
    ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
    ['7', '22', '3', '2somename', '2008-10-24 15:45:51'] 
] 
+2

轻微点分组,但你也许应该使用的元组内部列表的 – hop 2009-01-04 00:26:30

回答

45

工作对于第一个问题,首先映入你的第4个字段值应该做的是排序列表中第二场:

x = [ 
['4', '21', '1', '14', '2008-10-24 15:42:58'], 
['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
['5', '21', '3', '19', '2008-10-24 15:45:45'], 
['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
['7', '22', '3', '2somename', '2008-10-24 15:45:51'] 
] 

from operator import itemgetter 

x.sort(key=itemgetter(1)) 

然后你可以使用itertools' GROUPBY功能:

from itertools import groupby 
y = groupby(x, itemgetter(1)) 

现在y是一个包含(元素,项目迭代器)元组的迭代器。这是较为混乱来解释这些元组比它表明代码:

for elt, items in groupby(x, itemgetter(1)): 
    print(elt, items) 
    for i in items: 
     print(i) 

它打印:

21 <itertools._grouper object at 0x511a0> 
['4', '21', '1', '14', '2008-10-24 15:42:58'] 
['5', '21', '3', '19', '2008-10-24 15:45:45'] 
['6', '21', '1', '1somename', '2008-10-24 15:45:49'] 
22 <itertools._grouper object at 0x51170> 
['3', '22', '4', '2somename', '2008-10-24 15:22:03'] 
['7', '22', '3', '2somename', '2008-10-24 15:45:51'] 

对于第二部分,你应该使用列表内涵提到已经在这里:

from pprint import pprint as pp 
pp([y for y in x if y[3] == '2somename']) 

哪打印:

[['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
['7', '22', '3', '2somename', '2008-10-24 15:45:51']] 
+1

我已经添加了列表理解的例子。 – jfs 2009-01-03 18:34:11

3

如果我正确的理解你的问题下面的代码应该做的工作:

l = [ 
['4', '21', '1', '14', '2008-10-24 15:42:58'], 
['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
['5', '21', '3', '19', '2008-10-24 15:45:45'], 
['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
['7', '22', '3', '2somename', '2008-10-24 15:45:51'] 
] 

def compareField(field): 
    def c(l1,l2): 
     return cmp(l1[field], l2[field]) 
    return c 

# Use compareField(1) as the ordering criterion, i.e. sort only with 
# respect to the 2nd field 
l.sort(compareField(1)) 
for row in l: print row 

print 
# Select only those sublists for which 4th field=='2somename' 
l2somename = [row for row in l if row[3]=='2somename'] 
for row in l2somename: print row 

输出:

['4', '21', '1', '14', '2008-10-24 15:42:58'] 
['5', '21', '3', '19', '2008-10-24 15:45:45'] 
['6', '21', '1', '1somename', '2008-10-24 15:45:49'] 
['3', '22', '4', '2somename', '2008-10-24 15:22:03'] 
['7', '22', '3', '2somename', '2008-10-24 15:45:51'] 

['3', '22', '4', '2somename', '2008-10-24 15:22:03'] 
['7', '22', '3', '2somename', '2008-10-24 15:45:51'] 
+0

在2.6/3.0中将'cmp'参数排序,因此,最好使用提取排序键的'key'参数,否则使用+1。 – 2009-01-03 17:29:18

+0

删除'cmp =',无论如何应该是第一个参数。顺便说一下,我使用的Python 2.6.1和所有工作正常... – 2009-01-03 17:36:49

7

如果你把它分配给VAR “一” ...

#1:

a.sort(lambda x,y: cmp(x[1], y[1])) 

#2:

filter(lambda x: x[3]=="2somename", a) 
+0

比itemgetter – 2016-06-30 09:18:09

+0

lambda更简单和更干净的方法为胜利。我真的很喜欢这个解决方案 – alfredocambera 2016-11-10 19:31:01

2

使用函数重新排序列表,以便我可以按列表中的每个项目进行分组。例如,我希望能够按第二列进行分组(所有21都在一起)

列表有一个内置的排序方法,您可以提供一个提取排序键的函数。

>>> import pprint 
>>> l.sort(key = lambda ll: ll[1]) 
>>> pprint.pprint(l) 
[['4', '21', '1', '14', '2008-10-24 15:42:58'], 
['5', '21', '3', '19', '2008-10-24 15:45:45'], 
['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
['7', '22', '3', '2somename', '2008-10-24 15:45:51']] 

使用仅显示从每个内部列表中的某些值的函数。例如,我想,以减少该列表只包含“2somename”

这看起来像list comprehensions

>>> [ll[3] for ll in l] 
['14', '2somename', '19', '1somename', '2somename'] 
+0

用`[ll]替换'[ll [3] for l'`如果ll [3] =='2somename']`并修复输出。 – jfs 2009-01-03 18:39:05

2

如果您要进行大量排序和过滤,您可能会喜欢一些帮助功能。

m = [ 
['4', '21', '1', '14', '2008-10-24 15:42:58'], 
['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
['5', '21', '3', '19', '2008-10-24 15:45:45'], 
['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
['7', '22', '3', '2somename', '2008-10-24 15:45:51'] 
] 

# Sort and filter helpers. 
sort_on = lambda pos:  lambda x: x[pos] 
filter_on = lambda pos,val: lambda l: l[pos] == val 

# Sort by second column 
m = sorted(m, key=sort_on(1)) 

# Filter on 4th column, where value = '2somename' 
m = filter(filter_on(3,'2somename'),m) 
1

看起来很像你试图使用列表作为数据库。

当今Python在核心发行版中包含sqlite绑定。如果您不需要持久性,那么创建内存中的sqlite数据库非常简单(请参阅How do I create a sqllite3 in-memory database?)。

然后,您可以使用SQL语句来执行所有这些排序和过滤,而无需重新发明轮子。

2

对于部分(2),其中x为您的数组,我想你想,

[y for y in x if y[3] == '2somename'] 

将返回具有第四值是“2somename”只是你的数据列表的列表...尽管看起来卡米尔正在为SQL提供最好的建议......

1

你只是在你的结构上创建索引,对不对?

>>> from collections import defaultdict 
>>> def indexOn(things, pos): 
...  inx= defaultdict(list) 
...  for t in things: 
...    inx[t[pos]].append(t) 
...  return inx 
... 
>>> a=[ 
... ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
... ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
... ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
... ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
... ['7', '22', '3', '2somename', '2008-10-24 15:45:51'] 
... ] 

这是你的第一个请求,按位置分组1.

>>> import pprint 
>>> pprint.pprint(dict(indexOn(a,1))) 
{'21': [['4', '21', '1', '14', '2008-10-24 15:42:58'], 
     ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
     ['6', '21', '1', '1somename', '2008-10-24 15:45:49']], 
'22': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
     ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]} 

这里是你的第二个请求,通过位置3.

>>> dict(indexOn(a,3)) 
{'19': [['5', '21', '3', '19', '2008-10-24 15:45:45']], '14': [['4', '21', '1', '14', '2008-10-24 15:42:58']], '2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'], ['7', '22', '3', '2somename', '2008-10-24 15:45:51']], '1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']]} 
>>> pprint.pprint(_) 
{'14': [['4', '21', '1', '14', '2008-10-24 15:42:58']], 
'19': [['5', '21', '3', '19', '2008-10-24 15:45:45']], 
'1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']], 
'2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
       ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]}