2016-12-26 31 views
0

我在写一个简单的代码来使用selenium和xpath2.0函数从网页中抓取数据。Xpath 2.0函数不能在使用Saxon的Java中工作

因为硒仅支持xpath1.0功能,我想使用Saxon.jar

  1. 我已经下载并提取Saxon9he.jar文件到路径“C:\ Program Files文件\的Java \ jre1 .8.0_111 \ lib中\ EXT”
  2. 我已创建文件 “jaxp.properties” 用下面的行: javax.xml.transform.TransformerFactory中= net.sf.saxon.TransformerFactoryImpl javax.xml.xpath中。 XPathFactory“,”net.sf.saxon.xpath.XPathFactoryImpl
  3. 还将我的jar文件包含在eclipse库中。

但是,我无法使用Xpath2.0函数获取值。

在我的代码,如果我用

XPathFactory factory = XPathFactory.newInstance(); 

,而不是

XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON); 

我能够使用xpath1.0功能。但我需要Xpath2.0功能。请在这里指导我。

我的代码是:

import java.io.IOException; 
import java.io.StringReader; 

import javax.xml.parsers.DocumentBuilder; 
import javax.xml.parsers.DocumentBuilderFactory; 
import javax.xml.parsers.ParserConfigurationException; 
import javax.xml.xpath.XPath; 
import javax.xml.xpath.XPathConstants; 
import javax.xml.xpath.XPathExpression; 
import javax.xml.xpath.XPathExpressionException; 
import javax.xml.xpath.XPathFactory; 
import javax.xml.xpath.XPathFactoryConfigurationException; 
import javax.xml.xpath.XPathFunctionResolver; 
import javax.xml.xpath.XPathVariableResolver; 

import org.openqa.selenium.WebDriver; 
import org.openqa.selenium.firefox.FirefoxDriver; 
import org.w3c.dom.Document; 
import org.w3c.dom.NodeList; 
import org.xml.sax.InputSource; 
import org.xml.sax.SAXException; 

import net.sf.saxon.lib.NamespaceConstant; 
import net.sf.saxon.xpath.XPathFactoryImpl; 


public class XpathCheckClass { 

public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathFactoryConfigurationException, XPathExpressionException{ 


WebDriver dr = new FirefoxDriver(); 

dr.get("http://s15.a2zinc.net/clients/hartenergy/midstream17/Public/eBooth.aspx?Nav=False&BoothID=137384"); 
try { 

Thread.sleep(3000); 

} catch (Exception e) { 

} 

String source = dr.getPageSource(); 

Document doc = null; 

try { 

DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder(); 

doc = db.parse(new InputSource(new StringReader(source))); 

} catch (Exception e) { 
e.printStackTrace(); 
} 

System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl"); 
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON); 

// XPathFactory factory = XPathFactory.newInstance(); ---> default xpath factory 

XPath xpath = factory.newXPath(); 
XPathExpression expr = xpath.compile("if(//h2) then //h2 else //h1"); 

NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET); 

System.out.println(nodes.getLength()); 

for (int i = 0; i < nodes.getLength(); i++) { 
System.out.println(nodes.item(i).getTextContent()); 
} 


dr.close(); 
} 

} 

回答

1

最近撒克逊的版本不再公布自己作为JAXP的XPath服务,所以你需要显式实例的XPath工厂:

XPathFactory xf = new net.sf.saxon.XPathFactoryImpl(); 
+0

我想补充一点解释:Saxon JAR没有将自己公开为XPath处理器的原因是,太多的应用程序在编写和测试XPath 1.0时偶然发现它。不幸的是,JAXP接口没有提供任何方式来说“请找我一个XPath 2.0处理器”。 –