2015-02-24 76 views
0

我是Java新手,正在尝试编写一个程序,该程序从MW api获取给定单词的含义。输出是XML,现在我正在使用DOM解析器来打印所有定义的列表。通常情况下,检索XML将如下如何读取子标记的内容以及java中的父标记的XML

<?xml version="1.0" encoding="utf-8" ?> 
<entry_list version="1.0"> 
    <entry id="dictionary"><ew>dictionary</ew><subj>PU-1#PU-2#PU-3#CP-4</subj><hw>dic*tio*nary</hw><sound><wav>dictio04.wav</wav></sound><pr>ˈdik-shə-ˌner-ē, -ˌne-rē</pr><fl>noun</fl><in><il>plural</il> <if>dic*tio*nar*ies</if></in><et>Medieval Latin <it>dictionarium,</it> from Late Latin <it>diction-, dictio</it> word, from Latin, speaking</et><def><date>1526</date> <sn>1</sn> <dt>:a reference source in print or electronic form containing words usually alphabetically arranged along with information about their forms, <d_link>pronunciations</d_link>, functions, <d_link>etymologies</d_link>, meanings, and <d_link>syntactical</d_link> and idiomatic uses</dt> <sn>2</sn> <dt>:a reference book listing alphabetically terms or names important to a particular subject or activity along with discussion of their meanings and <d_link>applications</d_link></dt> <sn>3</sn> <dt>:a reference book listing alphabetically the words of one language and showing their meanings or translations in another language</dt> <sn>4</sn> <dt>:a <d_link>computerized</d_link> list (as of items of data or words) used for reference (as for information retrieval or word processing)</dt></def></entry> 
</entry_list> 

的定义列表将标签<dt>

内部封闭现在我面临的问题是标签<dt>里面有另一个子标签<d_link>。每当DOM解析器过这个子标签运行时,getNodeValue()方法正在考虑的结束标记<dt>

我的代码如下:

import org.w3c.dom.*; 
import javax.xml.parsers.*; 

public class Dictionary5 { 
    public static void main(String[] args) throws Exception { 
     String head = new String("http://www.dictionaryapi.com/api/v1/references/collegiate/xml/"); 
     String word = new String("banal"); 
     String apiKey = new String("?key=xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx"); //My API Key for Merriam webster 
     String finalURL = head.trim() + word.trim()+ apiKey.trim(); 
     try 
     { 
      DocumentBuilderFactory f = DocumentBuilderFactory.newInstance(); 
      DocumentBuilder b = f.newDocumentBuilder(); 
      Document doc = b.parse(finalURL); 

      doc.getDocumentElement().normalize(); 

      NodeList items = doc.getElementsByTagName("entry"); 
      for (int i = 0; i < items.getLength(); i++) 
      { 
       Node n = items.item(i); 

       if (n.getNodeType() != Node.ELEMENT_NODE) 
        continue; 

       Element e = (Element) n; 
       NodeList titleList = e.getElementsByTagName("dt"); 
       for (int j = 0; j < titleList.getLength(); j++){ 
        Node dt = titleList.item(j); 
        if (dt.getNodeType() != Node.ELEMENT_NODE) 
         continue;     
        Element titleElem = (Element) titleList.item(j); 
        Node titleNode = titleElem.getChildNodes().item(0); 
        System.out.println(titleNode.getNodeValue()); 
       } 
      } 
     } 
     catch (Exception e) 
     { 
      e.printStackTrace(); 
     } 

    } 
} 

输出是如下

:a reference source in print or electronic form containing words usually alphabetically arranged along with information about their forms, 
:a reference book listing alphabetically terms or names important to a particular subject or activity along with discussion of their meanings and 
:a reference book listing alphabetically the words of one language and showing their meanings or translations in another language 
:a 

正如你所看到的,第一,第二和第四个定义会突然结束,因为解析器遇到子标签<d_link>

我的预期输出是如下:

:a reference source in print or electronic form containing words usually alphabetically arranged along with information about their forms, pronunciations, functions, etymologies, meanings, and syntactical and idiomatic uses 
:a reference book listing alphabetically terms or names important to a particular subject or activity along with discussion of their meanings and applications 
:a reference book listing alphabetically the words of one language and showing their meanings or translations in another language 
:a computerized list (as of items of data or words) used for reference (as for information retrieval or word processing) 

可有人请帮我这。任何帮助,高度赞赏。提前致谢。

回答

0

在DOM模型中,对DT标签的内容将是文本,D_LINK元素,TEXT,D_LINK ....

所以你要连接在一起的所有文本元素(和它似乎也是内容的d_link标记)。你只是读第一个:titleElem.getChildNodes()。item(0)所以它是“突然”完成

+0

感谢您的回复,关于如何获取项目数量和循环它以concate所有文本转换为单个字符串。 – Naveen 2015-02-28 18:30:08

相关问题