从URL获取页面内容？

我想这个代码从URL页面的内容：从URL获取页面内容？

public static String getContentResult(URL url) throws IOException{ 

    InputStream in = url.openStream(); 
    StringBuffer sb = new StringBuffer(); 

    byte [] buffer = new byte[256]; 

    while(true){ 
     int byteRead = in.read(buffer); 
     if(byteRead == -1) 
      break; 
     for(int i = 0; i < byteRead; i++){ 
      sb.append((char)buffer[i]); 
     } 
    } 
    return sb.toString(); 
}

但这个网址：http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315 我不能让Asbtract：数据库管理系统将继续管理.....

你可以给我解决方案解决问题吗？在此先感谢

来源

2010-11-18 tiendv

可能的重复：http://stackoverflow.com/questions/1255730/java-retrieve-html-page-in-proper-encoding – 2010-11-18 15:32:01

@Matt Ball这里的问题是OP需要执行JavaScript才能获得期望的内容，从这个意义上说，这个问题是根本不同的。 – 2010-11-18 15:33:36

1.4.3的GET请求头：

HTTP/1.1 302 Moved Temporarily 
Connection: close 
Date: Thu, 18 Nov 2010 15:35:24 GMT 
Server: Microsoft-IIS/6.0 
location: http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE 
Content-Type: text/html; charset=UTF-8

这意味着服务器要你下载新的位置解决。因此，无论是直接从UrlConnection获取标题，然后按照该链接自动使用HttpClient，它会自动遵循重定向。下面的代码是基于HttpClient：

public class HttpTest { 
    public static void main(String... args) throws Exception { 

     System.out.println(readPage(new URL("http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315"))); 
    } 

    private static String readPage(URL url) throws Exception { 

     DefaultHttpClient client = new DefaultHttpClient(); 
     HttpGet request = new HttpGet(url.toURI()); 
     HttpResponse response = client.execute(request); 

     Reader reader = null; 
     try { 
      reader = new InputStreamReader(response.getEntity().getContent()); 

      StringBuffer sb = new StringBuffer(); 
      { 
       int read; 
       char[] cbuf = new char[1024]; 
       while ((read = reader.read(cbuf)) != -1) 
        sb.append(cbuf, 0, read); 
      } 

      return sb.toString(); 

     } finally { 
      if (reader != null) { 
       try { 
        reader.close(); 
       } catch (IOException e) { 
        e.printStackTrace(); 
       } 
      } 
     } 
    } 
}

来源

2010-11-18 15:36:27 dacwe

你能说明哪些lib用于这段代码，因为我无法用apache的httpcore运行它！ – tiendv 2010-11-19 07:02:24

我可以运行你的代码！但结果与我的代码相同？你能给我什么建议吗 – tiendv 2010-11-21 15:14:06

@tiendv：我刚刚试过这段代码，并且按照预期得到了重定向页面，你想得到什么？ – dacwe 2010-11-21 16:54:42

给定的网址上没有“数据库管理...”。也许，它是由JavaScript动态加载的。您需要有更复杂的应用程序才能下载此类内容;）

来源

2010-11-18 15:33:58

您正在查找的内容未包含在此URL中。打开浏览器并查看源代码。相反，很多JavaScript文件都被加载。我认为该内容稍后由AJAX调用提取。您需要了解内容是如何加载的。

Firfox插件Firebug可能对更详细的分析有所帮助。

来源

2010-11-18 15:34:05 stacker

，你应该使用的网址是：

http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE

因为您发布的原始网址（由dacwe提到的）发送重定向。

来源

2010-11-18 15:40:45 user3111525

从URL获取页面内容？

回答

相关问题