版本比较二进制文件在Python

我有两个二进制文件。他们是这个样子，但数据较为随意：版本比较二进制文件在Python

文件：

FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF ...

文件B：

41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37 ...

我想什么是调用类似：

>>> someDiffLib.diff(file_a_data, file_b_data)

并收到类似的东西：

[Match(pos=4, length=4)]

表明在这两个文件在第4位的字节是4个字节是相同的。序列44 43 42 41不匹配，因为它们不在每个文件的相同位置。

是否有会做差异对我来说是图书馆吗？或者我应该只编写循环来进行比较？

来源

2013-04-03 omghai2u

http://docs.python.org/2/library/difflib.html - 第一结果在谷歌 “在python DIFF”[在python/PHP的两个字符串之间差（的 – Andrey 2013-04-03 21:26:51

可能重复的HTTP ：//stackoverflow.com/questions/1209800/difference-between-two-strings-in-python-php） – Andrey 2013-04-03 21:27:45

@Andrey感谢，我试过了，但现在看来，'get_matching_blocks（）'不检查字节在每个文件中位于同一位置，只是序列存在于每个文件中。否则，是的，这正是我想要的。 – omghai2u 2013-04-03 21:28:12

您可以使用itertools.groupby()这一点，这里有一个例子：

from itertools import groupby 

# this just sets up some byte strings to use, Python 2.x version is below 
# instead of this you would use f1 = open('some_file', 'rb').read() 
f1 = bytes(int(b, 16) for b in 'FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF'.split()) 
f2 = bytes(int(b, 16) for b in '41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37'.split()) 

matches = [] 
for k, g in groupby(range(min(len(f1), len(f2))), key=lambda i: f1[i] == f2[i]): 
    if k: 
     pos = next(g) 
     length = len(list(g)) + 1 
     matches.append((pos, length))

或如上所述使用列表理解同样的事情：

matches = [(next(g), len(list(g))+1) 
      for k, g in groupby(range(min(len(f1), len(f2))), key=lambda i: f1[i] == f2[i]) 
       if k]

这里是如果你的例子设置正在使用Python 2.x：

f1 = ''.join(chr(int(b, 16)) for b in 'FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF'.split()) 
f2 = ''.join(chr(int(b, 16)) for b in '41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37'.split())

来源

2013-04-03 21:43:19

很热。我很喜欢你在那里做什么。我希望能有这样的美丽回答。 – omghai2u 2013-04-03 21:47:58

提供的itertools.groupbysolution工作正常，但它很慢。

我写了一个非常天真的尝试，使用numpy，并测试了它与另一个解决方案在我碰巧拥有的特定16MB文件上的差异，并且它在我的机器上快了42倍。有人熟悉numpy可能会显着改善这一点。

import numpy as np 

def compare(path1, path2): 
    x,y = np.fromfile(path1, np.int8), np.fromfile(path2, np.int8) 
    length = min(x.size, y.size) 
    x,y = x[:length], y[:length] 

    z = np.where(x == y)[0] 
    if(z.size == 0) : return z 

    borders = np.append(np.insert(np.where(np.diff(z) != 1)[0] + 1, 0, 0), len(z)) 
    lengths = borders[1:] - borders[:-1] 
    starts = z[borders[:-1]] 
    return np.array([starts, lengths]).T

来源

2015-06-16 19:10:47 Kevin

版本比较二进制文件在Python

回答

相关问题