查找2个字符串之间的最大子序列

给定2个字符串，查找2个字符串之间的最大字符重叠顺序。查找2个字符串之间的最大子序列

例如：

“klccconcertcenter”
“klconventioncenter”

在序列的最大重叠是 “klconetcenter”

这被例示：

KLC立方厘米上ÇËřtcenter
klcon v ËÑ吨离子中心

顺便说一句，这不是家庭作业。我实际上在生产中使用它来测试2个字符串的相似程度。例如，比较“ABC私人有限公司”和“ABC私人有限公司”，我将采用最大顺序重叠，除以长度较短的字符串，对这两个字符串的相关程度进行0-1等级评定。

我实际上有一个解决方案，但它是一个近似值。我幼稚的递归方法不适用于长字符串。

好，因为人们都在问我的解决方案：

def match_count(phrase1, phrase2): 
    """This approximates match_count_recur() because the recursive function does not scale for long strings""" 
    MAX_MATCH_COUNT_WIDTH = 15 
    if len(phrase1) > MAX_MATCH_COUNT_WIDTH: 
     return match_count(phrase1[:len(phrase1)/2], phrase2) + match_count(phrase1[len(phrase1)/2:], phrase2) 
    return match_count_recur(phrase1, phrase2) 


def match_count_recur(phrase1, phrase2): 
    """ 
    Checks the number of characters that intersect (in order) between 2 phrases 
    """ 
    if len(phrase2) < 1: return 0 
    if len(phrase1) < 1: return 0 
    if len(phrase1) == 1: return 1 if phrase1 in phrase2 else 0 
    if len(phrase2) == 1: return 1 if phrase2 in phrase1 else 0 
    if phrase1 in phrase2: return len(phrase1) 
    if phrase2 in phrase1: return len(phrase2) 

    char = phrase1[0] 
    current_count = 1 if char in phrase2 else 0 
    phrase2_idx = phrase2.index(char) + 1 if char in phrase2 else 0 

    no_skip_count = current_count + match_count(phrase1[1:], phrase2[phrase2_idx:]) 
    skip_count = match_count(phrase1[1:], phrase2) 

    return max(no_skip_count, skip_count) 

def get_similarity_score(phrase1, phrase2): 
    """ 
    Gets the similarity score of 2 phrases 
    """ 
    phrase1 = phrase1.lower().replace(" ", "") 
    phrase2 = phrase2.lower().replace(" ", "") 
    shorter_phrase = phrase2 
    longer_phrase = phrase1 
    if len(phrase1) < len(phrase2): 
     shorter_phrase = phrase1 
     longer_phrase = phrase2 
    return float(match_count(shorter_phrase, longer_phrase))/float(len(shorter_phrase))

来源

2017-02-16 nubela

您可能希望https://en.wikipedia.org/wiki/Levenshtein_distance为其确定存在现有的Python实现。 – ceejayoz

无论是不是家庭作业都没关系 - 你还没有显示任何工作。你应该提供你工作的证据，让用户试着弄清楚为什么你的方法“不能按比例缩放长字符串” – asongtoruin

@asongtoruin信仰在哪里？编辑并添加我的解决方案 – nubela

声音对我说，你正在试图解决的最长子问题（wiki page）。 This有一个可能很有趣的Python实现。

Python中还有一个内置的实现类似这样的东西，在difflib模块中。特别是difflib.Differ类。

来源

2017-02-16 16:33:29 Bendik

看起来像我现有的解决方案实际上解决了这个问题。这只是缓慢（如果字符串超长），因为问题的复杂性。 – nubela

这个问题有可能在二次时间内解决（所需时间随着每个字符串长度的乘积而增长），所以它实际上并不是很复杂。聪明的做法（见链接）可能会带来巨大的收益。 – Bendik

查找2个字符串之间的最大子序列

回答

相关问题