2011-04-04 315 views
13

我需要用一个单词输入文本文件。然后我需要找到使用wordnet的单词的词组名称,定义和例子。我已经阅读了这本书:“使用NLTK 2.0 Cookbook进行Python文本处理”以及“使用NLTK进行自然语言处理”来帮助我实现这一目标。虽然我已经理解了如何使用终端来完成此任务,但我无法使用文本编辑器进行同样的操作。使用WordNet查找同义词,定义和例句

例如,如果输入的文本具有单词“大吃一惊”时,输出必须以这种方式:

大吃一惊 (动词)惊奇,惊奇,碗过度 - 惊奇地克服; “这让人难以置信!” (形容词)目瞪口呆,模糊不清,惊呆了,惊呆了,惊呆了,哑口无言 - 好像惊呆了一样惊呆了; “一个警察的圈子因为拒绝看到这次事故而感到羞愧”; “令人fla目结舌的议员无言以对”; “被他的宣传消息吓坏了”

synsets,定义和例句是从WordNet直接获得的!

我有下面的代码:


from __future__ import division 
import nltk 
from nltk.corpus import wordnet as wn 


tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') 
fp = open("inpsyn.txt") 
data = fp.read() 

#to tokenize input text into sentences 

print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences 

#to tokenize the tokenized sentences into words 

tokens = nltk.wordpunct_tokenize(data) 
text = nltk.Text(tokens) 
words = [w.lower() for w in text] 
print words  #to print the tokens 

for a in words: 
    print a 

syns = wn.synsets(a) 
print "synsets:", syns 

for s in syns: 
    for l in s.lemmas: 
     print l.name 
    print s.definition 
    print s.examples 

我得到以下输出:


flabbergasted 

['flabbergasted'] 
flabbergasted 
synsets: [Synset('flabbergast.v.01'), Synset('dumbfounded.s.01')] 
flabbergast 
boggle 
bowl_over 
overcome with amazement 
['This boggles the mind!'] 
dumbfounded 
dumfounded 
flabbergasted 
stupefied 
thunderstruck 
dumbstruck 
dumbstricken 
as if struck dumb with astonishment and surprise 
['a circle of policement stood dumbfounded by her denial of having seen the accident', 'the flabbergasted aldermen were speechless', 'was thunderstruck by the news of his promotion'] 

有没有办法跟团引理的名字一起检索讲话的一部分?

+1

如果你重新登录的话,你应该接受安德烈的回答,尤指因为他不仅回答了,而且还回应了你的意见来帮助你。 – 2012-11-25 21:02:49

回答

22
def synset(word): 
    wn.synsets(word) 

所以默认你None

不返回任何东西,你应该写

def synset(word): 
    return wn.synsets(word) 

提取引理名称:

from nltk.corpus import wordnet 
syns = wordnet.synsets('car') 
syns[0].lemmas[0].name 
>>> 'car' 
[s.lemmas[0].name for s in syns] 
>>> ['car', 'car', 'car', 'car', 'cable_car'] 


[l.name for s in syns for l in s.lemmas] 
>>>['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car'] 
+0

非常感谢你! :)多么愚蠢的错误! – aks 2011-04-04 06:07:30

+0

有没有一种方式,我可以从同义词集只提取了字,并将其作为参数传递?例如,对于大吃一惊的话,你会得到同义词集(“flabbergast.v.01”)和 同义词集(“dumbfounded.s.01”)。我如何将这些参数作为参数传递给lemma_name函数? – aks 2011-04-04 06:57:47

+1

从nltk.corpus进口的WordNet 的SYN = wordnet.synsets( '汽车') [s.lemmas [0]。名称为S IN的SYN] >>> [ '汽车', '汽车','汽车”,‘汽车’,‘cable_car’] – Andrey 2011-04-04 07:15:08

5

在这里,我已经创建了一个模块,可以很容易地使用(导入),并且在传递给它的字符串时,将返回所有的词条单词的字符串。

模块:

#!/usr/bin/python2.7 
''' pass a string to this funciton (eg 'car') and it will give you a list of 
words which is related to cat, called lemma of CAT. ''' 
from nltk.corpus import wordnet as wn 
import sys 
#print all the synset element of an element 
def lemmalist(str): 
    syn_set = [] 
    for synset in wn.synsets(str): 
     for item in synset.lemma_names: 
      syn_set.append(item) 
    return syn_set 

用法:

注:模块名称是lemma.py因此 “从引理进口lemmalist”

>>> from lemma import lemmalist 
>>> lemmalist('car') 
['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car'] 

干杯!

+0

它得到了错误'不导入错误:没有名为引理 ' – 2017-01-19 11:23:48

0
synonyms = [] 
for syn in wordnet.synsets("car"): 
    for l in syn.lemmas(): 
     synonyms.append(l.name()) 
print synonyms 
+0

模块请编辑您的答案,包括更多的信息。代码只和“试试这个”答案是沮丧,因为它们不包含任何可搜索的内容,并没有解释为什么有人要“试试这个”。 – BrokenBinary 2016-10-25 00:24:14

0

NLTK 3.0lemma_names已从属性改变为方法。因此,如果你得到一个错误说:

TypeError: 'method' object is not iterable 

您可以使用修复:

>>> from nltk.corpus import wordnet as wn 
>>> [item for sysnet in wn.synsets('car') for item in sysnet.lemma_names()] 

这将输出:

>>> [ 
     'car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 
     'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 
     'car', 'elevator_car', 'cable_car', 'car' 
    ]