keyWords
如果它是一个字典,它更有用,那么它就是一个简单的字典查找来获得每个单词的分数。每个单词可以使用split()
来提取。
下面是一些代码来做到这一点。这假定标点符号是一个字的一部分(如您的示例结果列表keySentences
暗示):
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)
keySentences = []
for sentence in listOfSentences:
score = sum(keyWords.get(word, 0) for word in sentence.split())
if score > 0:
keySentences.append((score, sentence))
keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)
输出
['bam bam bam she also loves ham.', 'she ate the lamb.']
如果你想忽略标点符号你可以将其删除加工前的每句话:
import string
# mapping to remove punctuation with str.translate()
remove_punctuation = {ord(c): None for c in string.punctuation}
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)
keySentences = []
for sentence in listOfSentences:
score = sum(keyWords.get(word, 0) for word in sentence.translate(remove_punctuation).split())
if score > 0:
keySentences.append((score, sentence))
keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)
输出
['bam bam bam she also loves ham.', 'she ate the lamb.', 'mary had a little lamb.']
现在结果列表中还包括“玛丽有只小羊羔”。因为整个尾随的“羊肉”被str.translate()
删除。
这也将匹配'迟到ate' – The6thSense
OP只说了词,可能是他需要完全匹配 – Hackaholic
这就是我说你在这里做部分匹配,你是如何来到这个逻辑我不明白什么OP要求 – The6thSense