2014-09-19 95 views
1

我有一个格式为{(f1,f2):counts}的计数器。当我在此上运行Counter.most_common()时,我得到了正确的结果,但是我想为f2上的某个过滤器过滤most_common()。例如,f2 ='A'应返回f2 ='A'的most_common元素。这个怎么做?在Python中为计数器筛选most_common()

+3

在手机上,所以不能确定,但​​尝试'排序([项目项目在counter.items()如果项目[0] [1] =='A'],键= operator.itemgetter(1),反向=真)[:10]' – roippi 2014-09-19 04:24:55

+0

@roippi它的工作。如果你填写答案,我会接受它。 – codepk 2014-09-19 15:15:28

回答

0

如果我们看一下为Counter的源代码,我们看到它使用heapq保持O(n + k log n)其中k是想键的数量和nCounter的大小,而不是O(n log n)

def most_common(self, n=None): 
    '''List the n most common elements and their counts from the most 
    common to the least. If n is None, then list all element counts. 

    >>> Counter('abcdeabcdabcaba').most_common(3) 
    [('a', 5), ('b', 4), ('c', 3)] 

    ''' 
    # Emulate Bag.sortedByCount from Smalltalk 
    if n is None: 
     return sorted(self.items(), key=_itemgetter(1), reverse=True) 
    return _heapq.nlargest(n, self.items(), key=_itemgetter(1)) 

因为这是超过O(n),我们就可以过滤柜台,并得到其项目:

counts = Counter([(1, "A"), (2, "A"), (1, "A"), (2, "B"), (1, "B")]) 

Counter({(f1, f2): n for (f1, f2), n in counts.items() if f2 == "A"}).most_common(2) 
#>>> [((1, 'A'), 2), ((2, 'A'), 1)] 

虽然展开它可能使其稍快,如果该事项:

import heapq 
from operator import itemgetter 

filtered = [((f1, f2), n) for (f1, f2), n in counts.items() if f2 == "A"] 
heapq.nlargest(2, filtered, key=itemgetter(1)) 
#>>> [((1, 'A'), 2), ((2, 'A'), 1)]