2010-10-22 148 views
5

我从Google Googling了解到,使用XPath从XML中提取数据比使用DOM循环更有意义。使用Java的XPath循环遍历节点并提取特定的子节点值

目前,我已经实现了一个使用DOM的解决方案,但代码很冗长,感觉不整洁,无法维护,所以我想切换到更清洁的XPath解决方案。

比方说,我有这样的结构:

<products> 
    <product> 
     <title>Some title 1</title> 
     <image>Some image 1</image> 
    </product> 
    <product> 
     <title>Some title 2</title> 
     <image>Some image 2</image> 
    </product> 
    ... 
</products> 

我希望能够为循环每个<product>元素的运行,而这里面for循环,提取标题和图像节点值。

我的代码如下所示:

InputStream is = conn.getInputStream();   
DocumentBuilder builder = 
    DocumentBuilderFactory.newInstance().newDocumentBuilder(); 
Document doc = builder.parse(is); 
XPathFactory factory = XPathFactory.newInstance(); 
XPath xpath = factory.newXPath(); 
XPathExpression expr = xpath.compile("/products/product"); 
Object result = expr.evaluate(doc, XPathConstants.NODESET); 
NodeList products = (NodeList) result; 
for (int i = 0; i < products.getLength(); i++) { 
    Node n = products.item(i); 
    if (n != null && n.getNodeType() == Node.ELEMENT_NODE) { 
     Element product = (Element) n; 
     // do some DOM navigation to get the title and image 
    } 
} 

里面我for环我一次<product>Node,它被强制转换为Element

我可以简单地使用我的XPathExpression实例来编译和运行其他XPathNodeElement

回答

6

是的,你总是可以做这样的 -

XPathFactory factory = XPathFactory.newInstance(); 
XPath xpath = factory.newXPath(); 
XPathExpression expr = xpath.compile("/products/product"); 
Object result = expr.evaluate(doc, XPathConstants.NODESET); 
expr = xpath.compile("title"); // The new xpath expression to find 'title' within 'product'. 

NodeList products = (NodeList) result; 
for (int i = 0; i < products.getLength(); i++) { 
    Node n = products.item(i); 
    if (n != null && n.getNodeType() == Node.ELEMENT_NODE) { 
     Element product = (Element) n; 
     NodeList nodes = (NodeList) expr.evaluate(product,XPathConstants.NODESET); //Find the 'title' in the 'product' 
     System.out.println("TITLE: " + nodes.item(0).getTextContent()); // And here is the title 
    } 
}  

在这里,我给例如提取“标题”的价值。以同样的方式,你可以做'图像'

4

我不是这种方法的忠实粉丝,因为你必须建立一个文件(这可能是昂贵的),然后才能将XPath应用到它。

我发现VTD-XML在将XPath应用于文档时效率更高,因为您不需要将整个文档加载到内存中。以下是一些示例代码:

final VTDGen vg = new VTDGen(); 
vg.parseFile("file.xml", false); 
final VTDNav vn = vg.getNav(); 
final AutoPilot ap = new AutoPilot(vn); 

ap.selectXPath("/products/product"); 
while (ap.evalXPath() != -1) { 
    System.out.println("PRODUCT:"); 

    // you could either apply another xpath or simply get the first child 
    if (vn.toElement(VTDNav.FIRST_CHILD, "title")) { 
     int val = vn.getText(); 
     if (val != -1) { 
      System.out.println("Title: " + vn.toNormalizedString(val)); 
     } 
     vn.toElement(VTDNav.PARENT); 
    } 
    if (vn.toElement(VTDNav.FIRST_CHILD, "image")) { 
     int val = vn.getText(); 
     if (val != -1) { 
      System.out.println("Image: " + vn.toNormalizedString(val)); 
     } 
     vn.toElement(VTDNav.PARENT); 
    } 
} 

另请参阅此文章Faster XPaths with VTD-XML