2012-02-17 113 views
0

我使用下面的代码获取google搜索结果的前两页 但我只能抓取第一页(当搜索页面2时,它与第1页)如何使用htmlunit在Google上获取“下一页”

import com.gargoylesoftware.htmlunit.WebClient; 
import com.gargoylesoftware.htmlunit.html.HtmlElement; 
import com.gargoylesoftware.htmlunit.html.HtmlPage; 
import com.gargoylesoftware.htmlunit.html.HtmlTextInput; 


/** 
* A simple Google search test using HtmlUnit. 
* 
* @author Rahul Poonekar 
* @since Apr 18, 2010 
*/ 
public class Author_search { 
    static final WebClient browser; 

    static { 
     browser = new WebClient(); 
     browser.setJavaScriptEnabled(false); 
    } 

    public static void main(String[] arguments) { 
      searchTest(); 
    } 

    private static void searchTest() { 
     HtmlPage currentPage = null; 

     try { 
      currentPage = (HtmlPage) browser.getPage("http://www.google.com"); 
     } catch (Exception e) { 
      System.out.println("Could not open browser window"); 
      e.printStackTrace(); 
     } 
     System.out.println("Simulated browser opened."); 

     try { 
      ((HtmlTextInput) currentPage.getElementByName("q")).setValueAttribute("xxoo"); 
      currentPage = currentPage.getElementByName("btnG").click(); 
      System.out.println("contents: " + currentPage.asText()); 
      HtmlElement next = (HtmlElement)currentPage.getByXPath("//span[contains(text(), 'Next')]").get(0); 
      currentPage = next.click(); 
      System.out.println("contents: " + currentPage.asText()); 
     } catch (Exception e) { 
      System.out.println("Could not search"); 
      e.printStackTrace(); 
     } 
    } 
} 

有人可以告诉我如何解决这个问题吗?

的方式:

  1. 如何使用改变的HtmlUnit在谷歌的语言设置?任何 方便的方法?
  2. 是否治疗的HtmlUnit像“萤火虫”的HTML中 Firefox中,或者只是把它当作在文本“文件 - >保存”。在我 的意见,我相信它像对待这是一个探险家,我说的对?

回答

2

我代替:

HtmlElement next = (HtmlElement)currentPage.getByXPath("//span[contains(text(),'Next')]").get(0); 
currentPage = next.click(); 

有:

HtmlAnchor nextAnchor =currentPage.getAnchorByText("Next"); 
currentPage = nextAnchor.click(); 
+0

需要进口:com.gargoylesoftware.htmlunit.html.HtmlAnchor – 2012-02-29 00:03:55