2016-02-27 91 views
0

功能应该检查输入字符串的每个单词对所有词语的correct_spells列表,并返回一个字符串,即:编写一个名为spelling_corrector的函数。

  • 如果在原来的句子中的词与词精确匹配的 correct_spells然后该单词未被修改,应直接复制到输出字符串 。

  • 如果在句子中的词汇可以通过替换,插入或删除单个字符匹配在correct_spells列表 一个字,那么 字应该由correct_spelled 列表中选择正确的字代替。

  • 如果前两个条件都不成立,那么在 这个词中原来的字符串不应该被修改,应该直接将 拷贝到输出字符串中。

注:

  • 不要拼写检查一个或两个字母的单词(直接复制他们到 输出字符串)。

  • 如果是联系,请使用correct_spelled列表中的第一个单词。

  • 忽略大小写,即将大写字母视为与小写字母相同的 。

  • 输出字符串中的所有字符都应该是小写字母 。

  • 假定输入字符串仅包括字母字符和 空格。 (a-z和A-Z)

  • 删除单词之间的多余空格。

  • 删除输出字符串开始和结尾处的空格。

实例:

enter image description here

说明:

  • 在第一个例子 'THES' 不与任何东西替换。

  • 在第一个例子都“案例”和“车”能替换原句的“中科院”,而是“案”被选中,因为它是第一次遇到。

这是我做过尝试,但一直没有非常有用的代码:

def spelling_corrector(input_string,input_list): 
new_string = input_string.lower().split() 
count = 0 
for x in new_string: 
    for y in input_list: 
     for i in y: 
      if i not in x: 
       count += 1 
    if count == 1: 
     print(y) 
    if len(x) == len(y) or x not in input_list: 
     print(x) 

spelling_corrector("Thes is the Firs cas", ['that','first','case','car'])` 
+0

对于第二条规则,[Levenshtein距离(HTTPS://en.wikipedia .ORG /维基/ Levenshtein_distance) – RedLaser

回答

1
def replace_1(bad:str, good:str) -> bool: 
    """Return True if bad can be converted to good by replacing 1 letter. 
    """ 
    if len(bad) != len(good): 
     return False 

    changes = 0 
    for i,ch in enumerate(bad): 
     if ch != good[i]: 
      return bad[i+1:] == good[i+1:] 

    return False 

def insert_1(bad:str, good:str) -> bool: 
    """Return True if bad can be converted to good by inserting 1 letter. 
    """ 
    if len(bad) != len(good) - 1: 
     return False 

    for i,ch in enumerate(bad): 
     if ch != good[i]: 
      return bad[i:] == good[i+1:] 

    # At this point, all of bad matches first part of good. So it's an 
    # append of the last character. 
    return True 

def delete_1(bad:str, good:str) -> bool: 
    """Return True if bad can be converted to good by deleting 1 letter. 
    """ 
    if len(bad) != len(good) + 1: 
     return False 
    return insert_1(good, bad) 


def correction(word:str, correct_spells:list) -> str: 
    if len(word) < 3: 
     return word 
    if word in correct_spells: 
     return word 
    for good in correct_spells: 
     if replace_1(word, good): 
      return good 
     if insert_1(word, good): 
      return good 
     if delete_1(word, good): 
      return good 

    return word 

def spelling_corrector(sentence:str, correct_spells:list) -> str: 
    words = sentence.strip().lower().split() 
    correct_lower = [cs.lower() for cs in correct_spells] 
    result = [correction(w, correct_lower) for w in words] 
    return ' '.join(result) 

tests = (
    ('Thes is the Firs cas', "that first case car", 'thes is the first case'), 
    ('programming is fan and easy', "programming this fun easy hook", 'programming is fun and easy'), 
    ('Thes is vary essy', "this is very very easy", 'this is very easy'), 
    ('Wee lpve Python', "we Live In Python", 'we live python'), 
) 

if __name__ == "__main__": 
    for t in tests: 
     correct = t[1].split() 
     print(t[0], "|", t[1], "|", t[2]) 
     print("Result:", spelling_corrector(t[0], correct)) 
     assert spelling_corrector(t[0], correct) == t[2]