比较子元素列表中的另一个

我有句listOfSentences的名单看起来是这样的：有比较子元素列表中的另一个

listOfSentences = ['mary had a little lamb.', 
        'she also had a little pram.', 
        'bam bam bam she also loves ham.', 
        'she ate the lamb.']

我也keywords字典，看起来像这样：

keyWords= {('bam', 3), ('lamb', 2), ('ate', 1)}

哪里该词的频率越高，其在keyWords中的键越小。

>>> print(keySentences) 
>>> ['bam bam bam she also loves ham.', 'she ate the lamb.',]

我的问题是：我怎么能在元素keyWords在listOfSentences比较的元素，这样我可以输出列表keySentences

来源

2015-10-13 Marko

keyWords如果它是一个字典，它更有用，那么它就是一个简单的字典查找来获得每个单词的分数。每个单词可以使用split()来提取。

下面是一些代码来做到这一点。这假定标点符号是一个字的一部分（如您的示例结果列表keySentences暗示）：

listOfSentences = ['mary had a little lamb.', 
        'she also had a little pram.', 
        'bam bam bam she also loves ham.', 
        'she ate the lamb.'] 

keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)] 
keyWords = dict(keyWords) 

keySentences = [] 
for sentence in listOfSentences: 
    score = sum(keyWords.get(word, 0) for word in sentence.split()) 
    if score > 0: 
     keySentences.append((score, sentence)) 

keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)] 
print(keySentences)

输出

 
['bam bam bam she also loves ham.', 'she ate the lamb.']

如果你想忽略标点符号你可以将其删除加工前的每句话：

import string 

# mapping to remove punctuation with str.translate() 
remove_punctuation = {ord(c): None for c in string.punctuation} 

listOfSentences = ['mary had a little lamb.', 
        'she also had a little pram.', 
        'bam bam bam she also loves ham.', 
        'she ate the lamb.'] 

keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)] 
keyWords = dict(keyWords) 

keySentences = [] 
for sentence in listOfSentences: 
    score = sum(keyWords.get(word, 0) for word in sentence.translate(remove_punctuation).split()) 
    if score > 0: 
     keySentences.append((score, sentence)) 

keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)] 
print(keySentences)

输出

 
['bam bam bam she also loves ham.', 'she ate the lamb.', 'mary had a little lamb.']

现在结果列表中还包括“玛丽有只小羊羔”。因为整个尾随的“羊肉”被str.translate()删除。

来源

2015-10-13 11:53:48 mhawke

尝试这样的：

>>> [x for x in listOfSentences for i in keyWords if x.count(i[0])==i[1]] 
['bam bam bam she also loves ham.', 'she ate the lamb.']

来源

2015-10-13 11:33:17 Hackaholic

这也将匹配'迟到ate' – The6thSense

OP只说了词，可能是他需要完全匹配 – Hackaholic

这就是我说你在这里做部分匹配，你是如何来到这个逻辑我不明白什么OP要求 – The6thSense

下面将根据匹配字数得分你的句子：

import re 

keyWords = [('bam', 3), ('lamb', 2), ('ate', 1)] 
keyWords = [w for w, c in keyWords]  # only need the words 

listOfSentences = [ 
    'mary had a little lamb.', 
    'she also had a little pram.', 
    'bam bam bam she also loves ham.', 
    'she ate the lamb.']  

words = [re.findall(r'(\w+)', s) for s in listOfSentences] 
keySentences = [] 

for word_list, sentence in zip(words, listOfSentences): 
    keySentences.append((len([word for word in word_list if word in keyWords]), sentence)) 

for count, sentence in sorted(keySentences, reverse=True): 
    print '{:2} {}'.format(count, sentence)

给你以下的输出：

3 bam bam bam she also loves ham. 
2 she ate the lamb. 
1 mary had a little lamb. 
0 she also had a little pram

来源

2015-10-13 12:26:32

比较子元素列表中的另一个

回答

相关问题