2012-04-11 100 views
1

我想解析网页,iam使用htmlunit,当iam运行代码时,iam正在获取低于错误。获取错误未知主机:www.google.com

import java.net.URL; 
import java.util.List; 

import com.gargoylesoftware.htmlunit.WebClient; 
import com.gargoylesoftware.htmlunit.html.HtmlImage; 
import com.gargoylesoftware.htmlunit.html.HtmlPage; 

public class scrapImage { 

     public static void main(String[] args) throws Exception  { 
      URL url = new URL("http://www.google.com"); 
      //WebClient webClient = new WebClient(Opera);  
      WebClient webClient = new WebClient();  
      HtmlPage currentPage = (HtmlPage) webClient.getPage(url);  
      //get list of all divs  
      final List<?> images = currentPage.getByXPath("//img");  
      for (Object imageObject : images) {   
       HtmlImage image = (HtmlImage) imageObject;    
       System.out.println(image.getSrcAttribute());  
       }   //webClient.closeAllWindows();   } } 
      } 
     } 

错误消息:

Exception in thread "main" java.net.UnknownHostException: www.google.com 
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:196) 
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:377) 
    at java.net.Socket.connect(Socket.java:530) 
    at java.net.Socket.connect(Socket.java:480) 
    at java.net.Socket.<init>(Socket.java:377) 
    at java.net.Socket.<init>(Socket.java:251) 
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) 
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) 
    at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) 
    at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) 
    at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) 
    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) 
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) 
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) 
    at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:97) 
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1430) 
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1388) 
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:325) 
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:386) 
    at htmlunit.scrapImage.main(scrapImage.java:16) 

任何人都可以让我知道了上面的异常的解决方案。

回答

1

我认为它与您的网络连接或防火墙的问题可能会阻止Java程序访问互联网。

1

我认为你是在代理或防火墙后面。检查您系统中当前的防火墙状态。同时,如果它与代理服务器相关,则可以像这样修改代码。

System.getProperties().put("proxySet", "true"); 
System.getProperties().put("proxyHost", "your proxy host name"); 
System.getProperties().put("proxyPort", "85"); 

可能这会帮助你。

+0

当我通过InetSocketAddress获取代理名称addr =(InetSocketAddress)proxy.address();和System.out.println(“proxy hostname:”+ addr.getHostName()); ,因为addr本身为null并且proxy为null,所以获得空指针。请你指导我 – developer 2012-04-11 05:45:44

+0

给出你的代理服务器的名字,因为所有的请求都是通过这个路由。给你的代理IP地址而不是上面的。 – UVM 2012-04-11 05:55:33

1

似乎有一些麻烦与Internet的连接,或者你使用了代理,

设置代理服务器设置(主机/端口/用户名/密码),如果是这种情况。