2011-03-30 62 views
6

有没有一种方法可以通过NLTK从synsets中获取WordNet选择限制(如+动画,+人等)? 或者是否有其他方式提供有关synset的语义信息?我能得到的最接近的是上位关系。NLTK中的字网选择限制

回答

4

这取决于什么是你的“选择限制”或我称之为语义特征,因为在传统的语义,存在的concepts世界和概念之间进行比较,我们必须找到

  • 鉴别特征
  • 相似特征(即,其被用于区分它们彼此的概念特征)(即类似的概念的特征和强调需要区分它们)

例如:

Man is [+HUMAN], [+MALE], [+ADULT] 
Woman is [+HUMAN], [-MALE], [+ADULT] 

[+HUMAN] and [+ADULT] = similarity features 
[+-MALE] is the discrimating features 

传统语义的共同问题,把这个理论在计算语义的

这个问题:“有没有的,我们可以用功能的特定列表比较任何

“如果是这样,该列表上的功能是什么?” 概念?“

(详见www.acl.ldc.upenn.edu/E/E91/E91-1034.pdf)

再回到WordNet的,我可以建议2层的方法来解决“选择限制“

首先,检查上位词的区分功能,但首先您必须确定什么是区分功能。为了区分动物和人类,我们将区分特征作为[+人类]和[+ - 动物]。

from nltk.corpus import wordnet as wn 

# Concepts to compare 
dog_sense = wn.synsets('dog')[0] # It's http://goo.gl/b9sg9X 
jb_sense = wn.synsets('James_Baldwin')[0] # It's http://goo.gl/CQQIG9 

# To access the hypernym_paths()[0] 
# It's weird for that hypernym_paths gives a list of list rather than a list, nevertheless it works. 
dog_hypernyms = dog_sense.hypernym_paths()[0] 
jb_hypernyms = jb_sense.hypernym_paths()[0] 


# Discriminating features in terms of concepts in WordNet 
human = wn.synset('person.n.01') # i.e. [+human] 
animal = wn.synset('animal.n.01') # i.e. [+animal] 

try: 
    assert human in jb_hypernyms and animal not in jb_hypernyms 
    print "James Baldwin is human" 
except: 
    print "James Baldwin is not human" 

try: 
    assert human in dog_hypernyms and animal not in dog_hypernyms 
    print "Dog is an animal" 
except: 
    print "Dog is not an animal" 

二,检查@Jacob建议的相似性度量。

dog_sense = wn.synsets('dog')[0] # It's http://goo.gl/b9sg9X 
jb_sense = wn.synsets('James_Baldwin')[0] # It's http://goo.gl/CQQIG9 

# Features to check against whether the 'dubious' concept is a human or an animal 
human = wn.synset('person.n.01') # i.e. [+human] 
animal = wn.synset('animal.n.01') # i.e. [+animal] 

if dog_sense.wup_similarity(animal) > dog_sense.wup_similarity(human): 
    print "Dog is more of an animal than human" 
elif dog_sense.wup_similarity(animal) < dog_sense.wup_similarity(human): 
    print "Dog is more of a human than animal" 
+0

谢谢您的详细解答。我前一段时间意识到,由于您提到的原因,我无法在WordNet中找到相似性/区分性功能。 – erickrf 2013-12-04 17:38:31

0

你可以尝试使用一些相似功能与精选的synsets,并使用它来过滤。但它基本上与上位词树相同 - afaik所有的词网相似度函数在计算中都使用上位词距离。另外,synset有很多可选属性值得探索,但它们的存在可能非常不一致。