2017-02-10 141 views
1

我试图从Reddit获得HTML源代码与Golang:错误超时得到HTTP请求golang

package main 

import (
    "fmt" 
    "io/ioutil" 
    "net/http" 
    "time" 
) 

func main() { 
    timeout := time.Duration(5 * time.Second) 
    client := http.Client{ 
     Timeout: timeout, 
    } 
    resp, _ := client.Get("https://www.reddit.com/") 
    bytes, _ := ioutil.ReadAll(resp.Body) 
    fmt.Println("HTML:\n\n", string(bytes)) 
    defer resp.Body.Close() 
    var input string 
    fmt.Scanln(&input) 
} 

首先学尝试是好的。但在第二次它遇到了一个错误:

<p>we're sorry, but you appear to be a bot and we've seen too many requests 
from you lately. we enforce a hard speed limit on requests that appear to come 
from bots to prevent abuse.</p> 

<p>if you are not a bot but are spoofing one via your browser's user agent 
string: please change your user agent string to avoid seeing this message 
again.</p> 

<p>please wait 6 second(s) and try again.</p> 

    <p>as a reminder to developers, we recommend that clients make no 
    more than <a href="http://github.com/reddit/reddit/wiki/API">one 
    request every two seconds</a> to avoid seeing this message.</p> 

我试图设置延迟,但它仍然无法正常工作。 对不起,我的英语不好。

+0

来自reddit的响应看起来很容易理解。读两遍。 – ymonad

回答

0

Reddit不希望自动扫描器\抓取器在他们的网站上,并有一个机器人保护机制。 下面是他们的建议:

one request every two seconds

只需添加请求之间的延迟。

+0

我设置超时。但它仍然不起作用 timeout:= time.Duration(5 * time.Second) client:= http.Client {timeout}:timeout, } –

+0

不超时,但延迟。尝试在Get之前添加'time.Sleep(2000 * time.Millisecond)'。 –

0

timeout服务于不同的目的。超时是程序运行的上限。你需要的是后续请求之间的sleep

time.Sleep(6 * time.Second) 
+0

我添加time.Sleep下获取和ReadAll但仍然不工作 –

+0

你能告诉我更新的代码? – Fallen

+0

http://menly.ml/view/0f76a6c2 –