2016-03-07 175 views
0

的页面,在特定页面HTML contect从我的方法来读取从页面不同

http://www.centerplex.com.br/

我的方法

public String getHtml(String urlStr, String charset) throws Exception { 
    System.setProperty("http.proxyHost", "XXX.XX.X.XXX"); 
    System.setProperty("http.proxyPort", "XXXX"); 
    URL url = new URL(urlStr); 
    URLConnection conn = url.openConnection(); 
    InputStream is = url.openStream(); 
    InputStreamReader isr = new InputStreamReader(is, charset); 
    BufferedReader br = new BufferedReader(isr); 
    String linha = br.readLine(); 
    String html = ""; 
    while (linha != null) { 
     System.out.println("" + linha); 
     html += linha; 
     linha = br.readLine(); 
    } 

    return html; 
} 

这种方法会奏效到其他页面,但给我一个不完整的HTML。

我看到了大量的JavaScript通过该页面,但我不知道它是否具有影响力

下面是HTML返回从这个页面

<!doctype html> 
<html> 
    <head> 
     <title>Centerplex Cinemas</title> 
     <meta charset="iso-8859-1"> 
     <meta name="description" content=""> 
     <meta name="keywords" content=""> 
     <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;"> 
     <link href="apple-touch-icon.png" rel="apple-touch-icon" type="image/png"> 
     <link href="lib/css/estilo.css" rel="stylesheet" type="text/css"> 
    </head> 
    <body> 


       <div class="tematizacao"> 
        <iframe src="//www.youtube.com/embed/" class="trailer" frameborder="0" allowfullscreen></iframe> 
        <img src="http://www.centerplex.com.br/fotos/wallpaper_mobile/470.jpg" /> 
       </div> 



      <div class="header"> 

    <h1><a href="index.php" title="Centerplex">Centerplex</a></h1> 

    </div>  <div class="efilme"> 
      <a href="http://www.centerplex.com.br/mobile/filme.php?cf=5807" title="Kung Fu Panda 3"><img src="http://www.centerplex.com.br/fotos/hp_mobile/188.jpg" title="Kung Fu Panda 3" alt="Kung Fu Panda 3" width="100%"></a> 
        </div> 
     <ul class="nav"> 
      <li><a href="lancamentos.php" title="Estreias/Em Cartaz">Estreias/Em Cartaz</a></li> 
      <li><a href="salas-horarios.php" title="Salas & Horários">Salas & Horários</a></li> 
        </ul> 
      <ul class="fnav"> 

     <li><a href="breve.php" title="Em Breve" class="breve">Em Breve</a></li> 

     <li><a href="promocoes.php" title="Promoções" class="promo">Promoções</a></li> 

     <li><a href="corporativo.php" title="Corporativo" class="corp">Corporativo</a></li> 

     <li class="nbr"><a href="faleconosco.php" title="Fale Conosco" class="fale">Fale Conosco</a></li> 

    </ul>   <div class="footer"> 
    <p>©Centerplex 2016</p> 
    </div> 
<script> 
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ 
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), 
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) 
    })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); 

    ga('create', 'UA-3269539-1', 'auto'); 
    ga('send', 'pageview'); 

</script> 

    </body> 
</html> 

回答

0

有我的形成是其实现方法具没有bug在你的代码中。看起来像服务器端返回不同的内容,根据要求。尝试通过HttpClient库进行请求并模仿浏览器请求:

import java.io.IOException; 
import org.apache.commons.io.IOUtils; 
import org.apache.http.client.methods.CloseableHttpResponse; 
import org.apache.http.client.methods.HttpPost; 
import org.apache.http.impl.client.HttpClientBuilder; 

public class NewClass { 
    public static void main(String[] args) throws IOException { 
      String HOST = "www.centerplex.com.br"; 
      HttpPost post = new HttpPost("http://"+HOST+"/"); 
      post.setHeader("ProtocolVersion ", "HTTP/1.1"); 
      post.setHeader("Host",HOST); 
      post.setHeader("Connection","keep-alive"); 
      post.setHeader("Accept", "*/*"); 
      post.setHeader("User-Agent","Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36"); 
      post.setHeader("DNT","1"); 
      post.setHeader("Accept-Encoding","gzip, deflate, sdch"); 
      post.setHeader("Accept-Language","en-GB,en-U3;q=0.8,en;q=0.6"); 
      post.setHeader("Cookie","_gat=l; _ga=GAl.2.904730494.1449539712"); 
      post.setHeader("HeaderEnd","CRLF"); 
      CloseableHttpResponse response = HttpClientBuilder.create().build().execute(post); 
      String responseText = IOUtils.toString(response.getEntity().getContent(), "UTF-8"); 
      System.out.println(responseText); 
    } 
}