我是编程新手,我需要帮助。试图在golang上编写gitlab scraper。 当我试图在多线程模式下获取有关项目的信息时,出现了一些问题。gitlab通过golang抓取的问题
下面是代码:
func (g *Gitlab) getAPIResponce(url string, structure interface{}) error {
responce, responce_error := http.Get(url)
if responce_error != nil {
return responce_error
}
ret, _ := ioutil.ReadAll(responce.Body)
if string(ret) != "[]" {
err := json.Unmarshal(ret, structure)
return err
}
return errors.New(error_emptypage)
}
...
func (g *Gitlab) GetProjects() {
projects_chan := make(chan Project, g.LatestProjectID)
var waitGroup sync.WaitGroup
queue := make(chan struct{}, 50)
for i := g.LatestProjectID; i > 0; i-- {
url := g.BaseURL + projects_url + "/" + strconv.Itoa(i) + g.Token
waitGroup.Add(1)
go func(url string, channel chan Project) {
queue <- struct{}{}
defer waitGroup.Done()
var oneProject Project
err := g.getAPIResponce(url, &oneProject)
if err != nil {
fmt.Println(err.Error())
}
fmt.Printf(".")
channel <- oneProject
<-queue
}(url, projects_chan)
}
go func() {
waitGroup.Wait()
close(projects_chan)
}()
for project := range projects_chan {
if project.ID != 0 {
g.Projects = append(g.Projects, project)
}
}
}
这里是输出:
┌[ nb-kondrashov: ~/Gitlab/gitlab-auditor ]
└[ skondrashov ]-> ./gitlab-auditor
latest project = 1532
Gathering projects...
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Get https://gitlab.example.com/api/v4/projects/563&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/558&private_token=SeCrEt_ToKeN: unexpected EOF
..Get https://gitlab.example.com/api/v4/projects/531&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/571&private_token=SeCrEt_ToKeN: unexpected EOF
.Get https://gitlab.example.com/api/v4/projects/570&private_token=SeCrEt_ToKeN: unexpected EOF
..Get https://gitlab.example.com/api/v4/projects/467&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/573&private_token=SeCrEt_ToKeN: unexpected EOF
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
每当它的不同的项目,但它的id是大约550
当我试图从输出中扼制链接,我得到正常的JSON。当我试图用queue := make(chan struct{}, 1)
(单线程)运行这段代码时 - 一切都很好。
它可能是什么?
尝试限制同时连接的数量。 –
我想在这里做:'queue:= make(chan struct {},N)'(N =连接数),并且它有部分帮助,但是我失去了性能。这是gitlab,golang或我的电脑的问题吗? – sergkondr
我想50是太多了。可能存在某种限制连接数量的DDoS保护。尝试减少,例如5或10. –