2010-10-20 57 views
1

如何从wordnet生成更一般,较不一般和等价关系?wordnet关系

RitaWordnet中的词网相似度给出了一个类似于-1.0,0.222或1.0的数字,但是如何得出词之间更一般,较不一般的关系?哪个工具对于那个来说是理想的? 请帮我

我得到显示java.lang.NullPointerException,它打印 之后的“holonyms是”

package wordnet; 

import rita.wordnet.RiWordnet; 

public class Main { 
    public static void main(String[] args) { 
     try { 
      // Would pass in a PApplet normally, but we don't need to here 
      RiWordnet wordnet = new RiWordnet(); 
      wordnet.setWordnetHome("/usr/share/wordnet/dict"); 
      // Demo finding parts of speech 
      String word = "first name"; 
      System.out.println("\nFinding parts of speech for " + word + "."); 
      String[] partsofspeech = wordnet.getPos(word); 
      for (int i = 0; i < partsofspeech.length; i++) { 
       System.out.println(partsofspeech[i]); 
      } 

      //word = "eat"; 
      String pos = wordnet.getBestPos(word); 
      System.out.println("\n\nDefinitions for " + word + ":"); 
      // Get an array of glosses for a word 
      String[] glosses = wordnet.getAllGlosses(word, pos); 
      // Display all definitions 
      for (int i = 0; i < glosses.length; i++) { 
       System.out.println(glosses[i]); 
      } 

      // Demo finding a list of related words (synonyms) 
      //word = "first name"; 
      String[] poss = wordnet.getPos(word); 
      for (int j = 0; j < poss.length; j++) { 
       System.out.println("\n\nSynonyms for " + word + " (pos: " + poss[j] + ")"); 
       String[] synonyms = wordnet.getAllSynonyms(word, poss[j], 10); 
       for (int i = 0; i < synonyms.length; i++) { 
        System.out.println(synonyms[i]); 
       } 
      } 

      // Demo finding a list of related words 
      // X is Hypernym of Y if every Y is of type X 
      // Hyponym is the inverse 
      //word = "nurse"; 
      pos = wordnet.getBestPos(word); 
      System.out.println("\n\nHyponyms for " + word + ":"); 
      String[] hyponyms = wordnet.getAllHyponyms(word, pos); 
      //System.out.println(hyponyms.length); 
      //if(hyponyms!=null) 
      for (int i = 0; i < hyponyms.length; i++) { 


       System.out.println(hyponyms[i]); 
      } 

      System.out.println("\n\nHypernyms for " + word + ":"); 
      String[] hypernyms = wordnet.getAllHypernyms(word, pos); 
      //if(hypernyms!=null) 
      for (int i = 0; i < hypernyms.length; i++) { 
       System.out.println(hypernyms[i]); 
      } 

       System.out.println("\n\nHolonyms for " + word + ":"); 

      String[] holonyms = wordnet.getAllHolonyms(word, pos); 
      //if(holonyms!=null) 
      for (int i = 0; i < holonyms.length; i++) { 
       System.out.println(holonyms[i]); 
      } 

       System.out.println("\n\nmeronyms for " + word + ":"); 
      String[] meronyms = wordnet.getAllMeronyms(word, pos); 
      if(meronyms!=null) 
      for (int i = 0; i < meronyms.length; i++) { 
       System.out.println(meronyms[i]); 
      } 
       System.out.println("\n\nAntonym for " + word + ":"); 
      String[] antonyms = wordnet.getAllAntonyms(word, pos); 
      if(antonyms!=null) 
      for (int i = 0; i < antonyms.length; i++) { 
       System.out.println(antonyms[i]); 
      } 


      String start = "cameras"; 
      String end = "digital cameras"; 
      pos = wordnet.getBestPos(start); 

      // Wordnet can find relationships between words 
      System.out.println("\n\nRelationship between: " + start + " and " + end); 
      float dist = wordnet.getDistance(start, end, pos); 
      String[] parents = wordnet.getCommonParents(start, end, pos); 
      System.out.println(start + " and " + end + " are related by a distance of: " + dist); 

      // These words have common parents (hyponyms in this case) 
      System.out.println("Common parents: "); 
      if (parents != null) { 
       for (int i = 0; i < parents.length; i++) { 
        System.out.println(parents[i]); 
       } 
      } 

      //wordnet. 
      // System.out.println("\n\nHypernym Tree for " + start); 
      // int[] ids = wordnet.getSenseIds(start,wordnet.NOUN); 
      // wordnet.printHypernymTree(ids[0]); 
     } catch (Exception e) { 
      e.printStackTrace(); 
     } 
    } 
    } 
+1

你可以尝试JWI(麻省理工学院的Java WORDNET接口)。它很容易使用,为了得到你需要使用迭代器进行迭代的全息或上位词! http://projects.csail.mit.edu/jwi/ – 2010-11-28 22:08:15

回答

2

丽塔共发现确实为寻找上位(更普遍),上下义词提供API(一般少)和同义词。检查下面的页面的细节: -

http://www.rednoise.org/rita/wordnet/documentation/index.htm

会知道所有这些术语(上位词等)检查共发现的维基百科页面。

+1

是的..但它在大多数情况下抛出异常.. – karthi 2010-10-20 13:24:46

0

你可以尝试自己解析数据库。这不会那么难。 1)找到以下文件中的单词:index.noun,index.verb,index.adj和index.noun,2)提取其synsets(“sense”)的id,并且为每个synset转到data.noun ,data.verb,data.adj或data.noun并提取其上位词或下位词的synset id。然后搜索这些同义词id以获得同义词和光泽度。如果使用正则表达式,这很容易。

数据库(例如index.verb)可以在Wordnet的其中一个目录中找到,您可以从here下载这些数据库。如果您使用的是Linux,还有一个很好的命令行程序可以为您完成这项工作,但如果您想将它集成到Java代码中,恐怕您必须自己做所有的解析。你也可能会发现this link有趣。希望这有助于:)

PS:您也可以尝试NLTK(Python编写)