在nltk中是否有内置的方法来查找与给定单词非常匹配的单词/短语？

我正在使用的语音识别软件的结果并不理想。在nltk中是否有内置的方法来查找与给定单词非常匹配的单词/短语？

例如：session返回为fashion或mission。

现在我有一个像词典：

matches = { 
    'session': ['fashion', 'mission'], 
    ... 
}

，我遍历所有的话找到匹配。

我不介意误报，因为应用程序只接受一组有限的关键字。然而，为每个人手动输入新单词是很繁琐的。而且，每当我说话时，语音识别器都会产生新的单词。

我也遇到了一个长单词作为一组较小的单词返回的困难，所以上述方法将无法正常工作。

那么，在nltk中是否有一个内置的方法来做到这一点？或者甚至可以自己写一个更好的算法？

来源

2016-04-14 rohithpr

你可能想看看python-Levenshtein。这是一个用于计算字符串距离/相似度的python C扩展模块。

像这样愚蠢的低效的代码可能工作：

from Levenshtein import jaro_winkler # May not be module name 

heard_words = "brain" 
possible_words = ["watermelon", "brian"] 

word_scores = [jaro-winkler(heard_word, possible) for possible in possible_words] 
guessed_word = possible_words[word_scores.index(max(word_scores))] 

print('I heard {0} and guessed {1}'.format(heard_word, guessed_word))

这里的documentation和非维持repo。

来源

2016-05-04 19:50:34 bkrn

您可以使用fuzzywuzzy，这是一个用于模糊匹配单词和字符串的python包。

安装包装。

pip install fuzzywuzzy

与您的问题相关的示例代码。

from fuzzywuzzy import fuzz 

MIN_MATCH_SCORE = 80 

heard_word = "brain" 

possible_words = ["watermelon", "brian"] 

guessed_word = [word for word in possible_words if fuzz.ratio(heard_word, word) >= MIN_MATCH_SCORE] 

print 'I heard {0} and guessed {1}'.format(heard_word, guessed_word)

这里的fuzzywuzzy的documentation and repo。

来源

2016-05-05 12:45:25

在nltk中是否有内置的方法来查找与给定单词非常匹配的单词/短语？

回答

相关问题