查找接触和非接触的两个字符串

I的一部分有一对串查找接触和非接触的两个字符串

YHFLSPYVY  # answer 
    LSPYVYSPR # prediction 
+++******ooo 


    YHFLSPYVS # answer 
VEYHFLSPY  # prediction 
oo*******++

如上所述以上我想找到重叠区域（*）和非重叠区域中的答案的两个例子（+）和预测（o）。

我该怎么用Python做到这一点？

我坚持这个

import re 
# This is of example 1 
ans = "YHFLSPYVY" 
pred= "LSPYVYSPR" 
matches = re.finditer(r'(?=(%s))' % re.escape(pred), ans) 
print [m.start(1) for m in matches] 
#[]

的答案，我希望能得到例如1：

plus_len = 3 
star_len = 6 
ooo_len = 3

来源

2016-07-27 neversaint

你想第一重叠？还是最长的重叠？ –

你是否也想要带* + o的字符串或只是plus_len等的值？ –

看起来像[最长的公共子序列]（https://en.wikipedia.org/wiki/Longest_common_subsequence_problem） –

这很容易与difflib.SequenceMatcher.find_longest_match：

from difflib import SequenceMatcher 

def f(answer, prediction): 
    sm = SequenceMatcher(a=answer, b=prediction) 
    match = sm.find_longest_match(0, len(answer), 0, len(prediction)) 
    star_len = match.size 
    return (len(answer) - star_len, star_len, len(prediction) - star_len)

该函数返回一个整数的三元组(plus_len, star_len, ooo_len)：

f('YHFLSPYVY', 'LSPYVYSPR') -> (3, 6, 3) 
f('YHFLSPYVS', 'VEYHFLSPY') -> (2, 7, 2)

来源

2016-07-27 12:48:05 vaultah

SO就像一支超级聪明的大脑，连一秒都没有，问题回答了：D！ –

您可以使用difflib：

import difflib 

ans = "YHFLSPYVY" 
pred = "LSPYVYSPR" 

def get_overlap(s1, s2): 
    s = difflib.SequenceMatcher(None, s1, s2) 
    pos_a, pos_b, size = s.find_longest_match(0, len(s1), 0, len(s2)) 
    return s1[pos_a:pos_a+size] 

overlap = get_overlap(ans, pred) 
plus = ans.replace(get_overlap(ans, pred), "") 
oo = pred.replace(get_overlap(ans, pred), "") 

print len(overlap) 
print len(plus) 
print len(oo)

来源

2016-07-27 12:52:35

查找接触和非接触的两个字符串

回答

相关问题