当HTTP为1.1时，HTTP GET请求不能在java中工作？

所以我做了一个可以下载4chan页面的代码。我得到原始的HTML页面并解析它以满足我的需要。下面的代码工作正常，但它突然停止工作。当我运行它时，服务器不接受我的请求，它似乎在等待更多东西。但我知道HTTP请求如下当HTTP为1.1时，HTTP GET请求不能在java中工作？

GET /ck HTTP/1.1 
Host: boards.4chan.org 
(extra new line)

如果我改变这种格式在任何情况下我复活“400坏请求”状态代码。但如果我将HTTP/1.1更改为1.0，“200 ok”状态下的服务器响应会显示整个页面。所以这使得我的错误是在主机中，因为这在HTTP/1.1中变得强制。但我仍然无法弄清楚究竟需要改变什么。

调用函数只是这一点，得到一个整板

downloadHTMLThread("ck", -1);

或特定线程你刚刚更改-1到该号码。例如像下面的链接将有如下所示。

//http://boards.4chan.org/ck/res/3507158 
//url.getDefaultPort() is 80 
//url.getHost() is boards.4chan.org 
//url.getFile() is /ck/res/3507158 

downloadHTMLThread("ck", 3507158);

任何意见，将不胜感激，谢谢

public static final String BOARDS = "boards.4chan.org"; 
public static final String IMAGES = "images.4chan.org"; 
public static final String THUMBS = "thumbs.4chan.org"; 
public static final String RES = "/res/"; 
public static final String HTTP = "http://"; 
public static final String SLASH = "/"; 

public String downloadHTMLThread(String board, int thread) { 
    BufferedReader reader = null; 
    PrintWriter out = null; 
    Socket socket = null; 
    String str = null; 
    StringBuilder input = new StringBuilder(); 

    try { 
     URL url = new URL(HTTP+BOARDS+SLASH+board+(thread==-1?SLASH:RES+thread)); 
     socket = new Socket(url.getHost(), url.getDefaultPort()); 
     reader = new BufferedReader(new InputStreamReader(socket.getInputStream())); 
     out = new PrintWriter(socket.getOutputStream(), true); 

     out.println("GET " +url.getFile()+ " HTTP/1.1"); 
     out.println("HOST: " + url.getHost()); 
     out.println(); 

     long start = System.currentTimeMillis(); 
     while ((str = reader.readLine()) != null) { 
      input.append(str).append("\r\n"); 
     } 
     long end = System.currentTimeMillis(); 

     System.out.println(input); 
     System.out.println("\nTime: " +(end-start)+ " milliseconds"); 

    } catch (Exception ex) { 
     ex.printStackTrace(); 
     input = null; 
    } finally { 
     if(reader!=null){ 
      try { 
       reader.close(); 
      } catch (IOException ioe) { 
       // nothing to see here 
      } 
     } 
     if(socket!=null){ 
      try { 
       socket.close(); 
      } catch (IOException ioe) { 
       // nothing to see here 
      } 
     } 
     if(out!=null){ 
      out.close(); 
     } 
    } 
    return input==null? null: input.toString(); 
}

来源

2012-03-27 Shawn

尝试使用Apache HttpClient不是滚动你自己：

static String getUriContentsAsString(String uri) throws IOException { 
    HttpClient client = new DefaultHttpClient(); 
    HttpResponse response = client.execute(new HttpGet(uri)); 
    return EntityUtils.toString(response.getEntity()); 
}

如果你这样做是为了真正了解HTTP客户端请求的内部，那么你可能会通过在命令行中使用curl玩开始。这将让你得到所有的标题和请求身体摆脱。然后调整您的请求以匹配curl中的内容，这将是一件简单的事情。

来源

2012-03-27 18:20:08

我有另一个与Apache协同工作的代码，但我打算稍后对智能手机进行修改，所以我宁愿不使用第三方库。 – Shawn 2012-03-27 18:24:00

请听詹姆斯，帮你一个忙：使用Apache HttpClient。如果通过智能手机，你的意思是Android，[Apache HttpClient是内置的]（http://developer.android.com/reference/org/apache/http/package-summary.html）。 – 2012-03-27 18:31:02

我同意，但没有什么比编写原始代码更有效。我只是好奇，因为这是越来越讨厌 – Shawn 2012-03-27 18:32:43

通过我认为你要发送的，而不是 '主机' HOST'的代码。由于这是http/1.1中的强制性标头，但在http/1.0中被忽略，这可能是问题所在。无论如何，您可以使用程序来捕获发送的数据包（即wireshark），只是为了确保。使用println非常有用，但附加到命令的行分隔符取决于系统属性line.separator。我认为（虽然我不确定）http协议中使用的行分隔符必须是'\ r \ n'。如果你正在捕获数据包，我认为检查每行发送以'\ r \ n'（字节x0D0A）为结尾是个好主意（以防万一你的os分隔符不同）

来源

2012-03-27 17:58:50 BWitched

改为使用www.4chan.org作为主机。由于boards.4chan.org是一个302重定向到www.4chan.org，你将无法从boards.4chan.org上刮去任何东西。

来源

2012-03-27 18:11:07 GoalBased

我实际上已经检查过，当我使用4chan.org我得到“301永久移动”。我在使用Firefox控制台时检查了主机，并在主机上显示板.4chan.org – Shawn 2012-03-27 18:21:20

您是否尝试过使用www.4chan.org作为主机？（不是4chan.org） – GoalBased 2012-03-27 20:16:28

当HTTP为1.1时，HTTP GET请求不能在java中工作？

回答

相关问题