2017-06-13 94 views
1

我有两个列表,每个列表都具有非唯一编号,这意味着它们可以具有多次相同的值。可能重复值的两个列表之间的Python差异

我需要找到两者之间的差异,考虑到相同的值可能会出现多次(所以我不能采取每组之间的差异)的事实。所以,我需要检查一个值是否在第一个列表中出现的次数多于第二个列表中的次数。

的列表是:

l1 = [1, 2, 5, 3, 3, 4, 9, 8, 2] 
l2 = [1, 1, 3, 2, 4, 8, 9] 

# Sorted and justified 
l1 = [1, 2, 2, 3, 3, 4, 5, 8, 9] 
l2 = [1, 1, 2, 3, 4, 8, 9] 

列表中的元件可以是字符串或整数或浮点数。 所以结果列表应该是:

difference(l1, l2) == [3, 5, 2] 
# There is an extra 2 and 3 in l1 that is not in l2, and a 5 in l1 but not l2. 

difference(l2, l1) == [1] 
# The extra 1 is the only value in l2 but not in l1. 

我已经试过列表理解[x for x in l1 if x not in l2]这是不行的,因为它没有考虑在这两个重复的值。

+0

你试过做什么? – depperm

+0

我试过列表生成器,只有我能想到的这种情况下,没有建立一个循环函数[x在l1中x,如果x不在l2中]不起作用 – clg4

+0

值是整数,还是你需要更通用的解决方案 –

回答

4

如果订单重要的是,你可以使用一个Counter(见collections模块的标准库):

from collections import Counter 

l1 = [1,2,5,3,3,4,9,8,2] 
l2 = [1,1,3,2,4,8,9] 

c1 = Counter(l1) # Counter({2: 2, 3: 2, 1: 1, 5: 1, 4: 1, 9: 1, 8: 1}) 
c2 = Counter(l2) # Counter({1: 2, 3: 1, 2: 1, 4: 1, 8: 1, 9: 1}) 

diff1 = list((c1-c2).keys()) # [2, 5, 3] 
diff2 = list((c2-c1).keys()) # [1] 

这是相当普遍的,并与琴弦的作品,太:

... 
l1 = ['foo', 'foo', 'bar'] 
l2 = ['foo', 'bar', 'bar', 'baz'] 
... 
# diff1 == ['foo'] 
# diff2 == ['bar', 'baz'] 
2

我有一种感觉,很多人会来这里为multiset的差异(例如:[1, 1, 1, 2, 2, 2, 3, 3] - [1, 2, 2] == [1, 1, 2, 3, 3]),所以我也会在这里发布该答案:

import collections 

def multiset_difference(a, b): 
    """Compute a - b of two multisets a and b""" 
    a = collections.Counter(a) 
    b = collections.Counter(b) 

    difference = a - b 
    return difference # Remove this line if you want it as a list 

    as_list = [] 
    for item, count in difference.items(): 
     as_list.extend([item] * count) 
    return as_list 

def ordered_multiset_difference(a, b): 
    """As above, but preserves order and is O(ab) worst case""" 
    difference = list(a) 
    depleted = set() # Values that aren't in difference to prevent searching the list again 
    for i in b: 
     if i not in depleted: 
      try: 
       difference.remove(i) 
      except ValueError: 
       depleted.add(i) 
    return difference 
0

使用Counter可能是一个更好的选择,但要自己把它卷:

def diff(a, b): 
    result = [] 
    cpy = b[:] 
    for ele in a: 
     if ele in cpy: 
      cpy.remove(ele) 
     else: 
      result.append(ele) 
    return result 

或虐待的一行:

def diff(a, b): 
    return [ele for ele in a if ele not in b or b.remove(ele)] 

的一个衬垫的过程中破坏b的差异,所以你可能想通过它一个副本:diff(l1, l2[:]),或使用:

def diff(a, b): 
    cpy = b[:] 
    return [ele for ele in a if ele not in cpy or cpy.remove(ele)] 
+1

'如果ele在cpy:cpy.remove(ele)'扫描整个列表两次。单线程不能正确处理伪值(例如'diff([0,0,1],[1]) - >'[]')。如果你想要一个简单的单线程,只需使用列表理解不包含它。 '[ele for ele if in cpy or cpy.remove(ele)]''。但是,这又一次,每次需要删除时都会重复两次。 – Artyer

+0

@Artyer我修复了过滤器......我很好奇你是否已经尝试了你的列表理解改变,因为它会给我一个'ValueError'。而且,正如我所提到的,Counter可能是一个更好的选择。 – TemporalWolf

+1

糟糕。我的意思是忘了一个'不'。 '[Ele for ele不是在cpy或cpy.remove(ele)]' – Artyer