2016-11-24 42 views
0

在我的功能,我将创建不同的元组,并加入到一个空表:如何在python的元组列表中应用groupby?

tup = (pattern,matchedsen) 
matchedtuples.append(tup) 

的模式有正则表达式的格式。我要寻找一个在下列方式上matchedtuples适用groupby()

例如:

matchedtuples = [(p1, s1) , (p1,s2) , (p2, s5)] 

而且我在寻找这样的结果:

result = [ (p1,(s1,s2)) , (p2, s5)] 

因此,以这种方式,我将有组的句子具有相同的模式。我怎样才能做到这一点?

回答

0

如果您需要输出结果,您需要手动循环遍历matchedtuples的分组并建立您的列表。

首先,当然,如果matchedtuples列表不排序,排序它itemgetter

from operator import itemgetter as itmg 

li = sorted(matchedtuples, key=itmg(0)) 

然后,通过groupby供应,追加到基于大小的列表r遍历结果组:

r = [] 
for i, j in groupby(matchedtuples, key=itmg(0)): 
    j = list(j) 
    ap = (i, j[0][1]) if len(j) == 1 else (i, tuple(s[1] for s in j)) 
    r.append(ap) 
0

我的答案为您的问题将适用于任何输入结构,您将使用和打印相同的输出,因为你给。我将只groupby使用来自itertools模块:

# Let's suppose your input is something like this 
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5")] 

from itertools import groupby 

result = [] 

for key, values in groupby(a, lambda x : x[0]): 
    b = tuple(values) 
    if len(b) >= 2: 
     result.append((key, tuple(j[1] for j in b))) 
    else: 
     result.append(tuple(j for j in b)[0]) 

print(result) 

输出:

[('p1', ('s1', 's2')), ('p2', 's5')] 

如果你添加更多的值,以你的输入同样的解决方案的工作:

# When you add more values to your input 
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5"), ("p2", "s6"), ("p3", "s7")] 

from itertools import groupby 

result = [] 

for key, values in groupby(a, lambda x : x[0]): 
    b = tuple(values) 
    if len(b) >= 2: 
     result.append((key, tuple(j[1] for j in b))) 
    else: 
     result.append(tuple(j for j in b)[0]) 

print(result) 

输出:

[('p1', ('s1', 's2')), ('p2', ('s5', 's6')), ('p3', 's7')] 

现在,如果您修改输入结构:

# Let's suppose your modified input is something like this 
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"])] 

from itertools import groupby 

result = [] 

for key, values in groupby(a, lambda x : x[0]): 
    b = tuple(values) 
    if len(b) >= 2: 
     result.append((key, tuple(j[1] for j in b))) 
    else: 
     result.append(tuple(j for j in b)[0]) 

print(result) 

输出:

[(['p1'], (['s1'], ['s2'])), (['p2'], ['s5'])] 

另外,如果你添加更多的值到新的输入结构相同的解决方案的工作:

# When you add more values to your new input 
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"]), (["p2"], ["s6"]), (["p3"], ["s7"])] 

from itertools import groupby 

result = [] 

for key, values in groupby(a, lambda x : x[0]): 
    b = tuple(values) 
    if len(b) >= 2: 
     result.append((key, tuple(j[1] for j in b))) 
    else: 
     result.append(tuple(j for j in b)[0]) 

print(result) 

输出:

[(['p1'], (['s1'], ['s2'])), (['p2'], (['s5'], ['s6'])), (['p3'], ['s7'])] 

Ps:测试此代码,如果它与任何其他类型的输入中断,请让我知道。