2015-07-20 77 views
2

我需要对类对象进行多重比较。然而,只有选定字段的值都受到比较,即:类对象的选择性比较

class Class: 
    def __init__(self, value1, value2, value3, dummy_value): 
     self.field1 = value1 
     self.field2 = value2 
     self.field3 = value3 
     self.irrelevant_field = dummy_value 

obj1 = Class(1, 2, 3, 'a') 
obj2 = Class(1, 2, 3, 'b') #compare(obj1, obj2) = True 
obj3 = Class(1, 2, 4, 'a') #compare(obj1, obj3) = False 

目前我做这种方式:

def dumm_compare(obj1, obj2): 
    if obj1.field1 != obj2.field1: 
     return False 
    if obj1.field2 != obj2.field2: 
     return False 
    if obj1.field3 != obj2.field3: 
     return False 
    return True 

至于我的实际相关领域的数大于10,这种方法会导致到相当庞大的代码。这就是为什么我尝试这样的事情:

def cute_compare(obj1, obj2): 
    for field in filter(lambda x: x.startswith('field'), dir(obj1)): 
     if getattr(obj1, field) != getattr(obj2, field): 
      return False 
    return True 

该代码是紧凑的;然而,性能遭受重大损失:

import time 

starttime = time.time() 
for i in range(100000): 
    dumm_compare(obj1, obj2) 
print('Dumm compare runtime: {:.3f} s'.format(time.time() - starttime)) 

starttime = time.time() 
for i in range(100000): 
    cute_compare(obj1, obj2) 
print('Cute compare runtime: {:.3f} s'.format(time.time() - start time)) 

#Dumm compare runtime: 0.046 s 
#Cute compare runtime: 1.603 s 

是否有办法更有效地实现选择性对象比较?其实我需要几个这样的函数(它们通过不同的,有时重叠的字段集来比较对象)。这就是为什么我不想覆盖内置的类方法。

+1

您是否事先知道有多少个田地? –

+1

明确应该*应与*进行比较的字段比较快速,例如使用类属性COMPARE_FIELDS = ['field1','field2',...]',然后遍历它。 – jonrsharpe

回答

1

如果在一个特定的比较组的所有实例存在的领域, 尝试保存列表以与课程进行比较。

def prepped_compare(obj1, obj2): 
    li_field = getattr(obj1, "li_field", None) 
    if li_field is None: 
     #grab the list from the compare object, but this assumes a 
     #fixed fieldlist per run. 
     #mind you getattr(obj,non-existentfield) blows up anyway 
     #so y'all making that assumption already 
     li_field = [f for f in vars(obj1) if f.startswith('field')] 
     obj1.__class__.li_field = li_field 

    for field in li_field: 
     if getattr(obj1, field) != getattr(obj2, field): 
      return False 
    return True  

或预先计算外,更好

def prepped_compare2(obj1, obj2, li_field): 

    for field in li_field: 
     if getattr(obj1, field) != getattr(obj2, field): 
      return False 
    return True  


starttime = time.time() 
li_field = [f for f in vars(obj1) if f.startswith('field')] 
for i in range(100000): 
    prepped_compare2(obj1, obj2, li_field) 
print('prepped2 compare runtime: {:.3f} s'.format(time.time() - starttime)) 

输出:

Dumm compare runtime: 0.051 s 
Cute compare runtime: 0.762 s 
prepped compare runtime: 0.122 s 
prepped2 compare runtime: 0.093 s 

重。覆盖eq,我很肯定你可以有类似的东西。

def mycomp01(self, obj2) #possibly with a saved field list01 on the class 
def mycomp02(self, obj2) #possibly with a saved field list02 on the class 

#let's do comp01. 
Class.__eq__ = mycomp01 
run comp01 tests 
Class.__eq__ = mycomp02 
run comp02 tests 
1

dir()不仅包含实例属性,还会遍历类层次结构。因此它在这里所做的工作要多得多; dir()实际上只适用于调试任务。

棒使用vars()代替,或许与any()组合:

def faster_compare(obj1, obj2): 
    obj2_vars = vars(obj2) 
    return all(value == obj2_vars[field] 
       for field, value in vars(obj1).items() if field.startswith('field')) 

vars()返回包含仅实例的属性的字典;在上面的生成器表达式中,我通过使用dict.items()方法在一个步骤中访问属性名称和它的值。

我将getattr()方法调用替换为obj2以使用相同的字典方法,每次都可以节省一次framestack推送和弹出操作,因为完全可以在字节码(C代码)中处理密钥查找。请注意,这确实假定你没有使用属性;只会列出实际的实例属性。

这种方法仍然需要做更多的工作比硬编码if分支,但它至少是不执行所有的坏:

>>> from timeit import timeit 
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, dumm_compare as compare') 
0.349234500026796 
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, cute_compare as compare') 
16.48695448896615 
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, faster_compare as compare') 
1.9555692840367556 
+0

不应该是'不返回任何值(value!= obj2_vars [field] ...'? – overactor

+0

@overactor:oops,反转行程在那里,不应该使用'all()'。 –