2016-08-21 85 views
1

从词典:如何创建字典从另一个字典如果某些条件满足

{0: (u'Donald', u'PERSON'), 1: (u'John', u'PERSON'), 2: (u'Trump', u'PERSON'), 14: (u'Barack', u'PERSON'), 15: (u'Obama', u'PERSON'), 17: (u'Michelle', u'PERSON'), 18: (u'Obama', u'PERSON'), 30: (u'Donald', u'PERSON'), 31: (u'Jonh', u'PERSON'), 32: (u'Trump', u'PERSON')} 

我想创建另一个解释如下:

{u'Donald John Trump': 2, u'Barack Obama':1, u'Michele Obama':1} 

这里0,1,2和30 ,31,32个键正在增加1并发生两次。每个发生14,15,17,18次。有什么方法可以创建这样的字典吗?

+0

'(u'Obama 'u'PERSON')'是有两次在你的字典里,但它不包括在结果? –

+1

@BurhanKhalid我认为人们按连续的按键顺序分组,所以'奥巴马'出现两次,但用于'奥巴马'和'米歇尔奥巴马'。 – Delgan

+0

但是字典键14&15和17&18需要先合并,因为它增加了1。 – KevinOelen

回答

3

我认为你需要解决的主要问题是通过按照你描述的那样通过对表示增加的int序列的键进行分组来识别人员。

幸运的是,Python对此有a recipe

from itertools import groupby 
from operator import itemgetter 
from collections import defaultdict 

dct = { 
    0: ('Donald', 'PERSON'), 
    1: ('John', 'PERSON'), 
    2: ('Trump', 'PERSON'), 
    14: ('Barack', 'PERSON'), 
    15: ('Obama', 'PERSON'), 
    17: ('Michelle', 'PERSON'), 
    18: ('Obama', 'PERSON'), 
    30: ('Donald', 'PERSON'), 
    31: ('John', 'PERSON'), 
    32: ('Trump', 'PERSON') 
} 

persons = defaultdict(int) # Used for conveniance 
keys = sorted(dct.keys()) # So groupby() can recognize sequences 

for k, g in groupby(enumerate(keys), lambda d: d[0] - d[1]): 
    ids = map(itemgetter(1), g)    # [0, 1, 2], [14, 15], etc. 
    person = ' '.join(dct[i][0] for i in ids) # "Donald John Trump", "Barack Obama", etc 
    persons[person] += 1 

print(persons) 
# defaultdict(<class 'int'>, 
#  {'Barack Obama': 1, 
#   'Donald John Trump': 2, 
#   'Michelle Obama': 1}) 
+0

令人惊叹!非常感谢 – KevinOelen

2
def add_name(d, consecutive_keys, result): 
    result_key = ' '.join(d[k][0] for k in consecutive_keys) 
    if result_key in result: 
     result[result_key] += 1 
    else: 
     result[result_key] = 1 

d = {0: (u'Donald', u'PERSON'), 1: (u'John', u'PERSON'), 2: (u'Trump', u'PERSON'), 
    14: (u'Barack', u'PERSON'), 15: (u'Obama', u'PERSON'), 
    17: (u'Michelle', u'PERSON'), 18: (u'Obama', u'PERSON'), 
    30: (u'Donald', u'PERSON'), 31: (u'John', u'PERSON'), 32: (u'Trump', u'PERSON')} 

sorted_keys = sorted(d.keys()) 
last_key = sorted_keys[0] 
consecutive_keys = [last_key] 
result = {} 
for i in sorted_keys[1:]: 
    if i == last_key + 1: 
     consecutive_keys.append(i) 
    else: 
     add_name(d, consecutive_keys, result) 
     consecutive_keys = [i]   
    last_key = i 
add_name(d, consecutive_keys, result) 

print(result) 

输出

{'Donald John Trump': 2, 'Barack Obama': 1, 'Michelle Obama': 1} 
+0

这也适用!谢谢! – KevinOelen

相关问题