2017-09-23 189 views
0

我试图用http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap来抓取它的数据。这里是我的代码中使用的Net :: HTTP发送GET请求:Ruby Net :: HTTP 400错误请求

require 'net/http' 
require 'uri' 

def get_stocks() 
    uri = URI.parse('http://www.nasdaqomxnordic.com/aktier/listed-companies/stockholm') 
    response = Net::HTTP.get_response(uri) 
    puts response 
end 

get_stocks() 

其它网站我测试过的作品好,并用200回应:OK,但http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap返回#<Net::HTTPBadRequest:0x00007ffe8f84ec30>,我不明白为什么。

对于更详细的上下文response.body回报:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="TEXT/HTML; CHARSET=utf-8"/> 
<title>400 Bad Request</title></head> 
<body> 
    <H2>400 Bad Request</H2> 
    <p>The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.</p> 
    <p>This page can't be displayed.<br/>The incident ID is: 10039581164792379.</p> 
    <p>If you would like assistance, please contact the Support for additional information.<br></p> 
</body> 
</html> 

我能做些什么,以获得一个200:OK?

任何帮助非常感谢!提前致谢!

回答

0

我认为你需要设置请求的User-Agent属性。 以下代码有效。

require 'net/http' 
require 'uri' 

def get_stocks() 
    uri = URI.parse("http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap") 
    http = Net::HTTP.new(uri.host, uri.port) 
    request = Net::HTTP::Get.new(uri.request_uri) 
    user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36' 
    request.initialize_http_header({"User-Agent" => user_agent}) 

    response = http.request(request) 
    puts response.inspect 
end 

get_stocks() # #<Net::HTTPOK 200 OK readbody=true> 

可以使用response.body

+0

谢谢你得到响应的身体!你为我节省了很多挫折! – Villevillekulla