2016-01-22 50 views
1

我正在使用美丽的汤来刮这个网址 http://www.gbgb.org.uk/resultsRace.aspx?id=1839041它工作正常,显示所有字段的required.However它只显示夹具结果卡上的一个种族,我想提取整个赛事会议,其中9至14场比赛的变化在这里是整个会议的网址http://www.gbgb.org.uk/resultsMeeting.aspx?id=135488。 有没有什么方法可以循环,完整的比赛卡和显示卡上所有比赛的内容。下面是一场比赛的代码。美丽的汤通过网址循环显示数据

from urllib import urlopen 

from bs4 import BeautifulSoup 
html = urlopen("http://www.gbgb.org.uk/resultsRace.aspx?id=1839041") 

bsObj = BeautifulSoup(html) 
nameList = bsObj. findAll("div", {"class": "track"}) 
for name in nameList: 
print(name. get_text()) 

nameList = bsObj. findAll("div", {"class": "date"}) 
for name in nameList: 
print(name. get_text()) 

nameList = bsObj. findAll("div", {"class": "datetime"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("div", {"class": "grade"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("div", {"class": "distance"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("div", {"class": "prizes"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("li", {"class": "first essential fin"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("li", {"class": "essential greyhound"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("li", {"class": "trap"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("li", {"class": "sp"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("li", {"class": "timeSec"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("li", {"class": "timeDistance"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("li", {"class": "essential trainer"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("li", {"class": "first essential comment"}) 
for name in nameList: 
print(name. get_text()) 
nameList = bsObj. findAll("div", {"class": "resultsBlockFooter"}) 
for name in nameList: 
print(name. get_text()) 
+0

请您可以标记我的答案是正确的,或者评论如何改进它。 – ncfirth

回答

0

你只需要迭代结果块。标签略有不同,但本质上是相同的。我在Chrome中使用检查元素功能,使得HTML抓取变得容易。

from urllib import urlopen 

from bs4 import BeautifulSoup 
baseURL = 'http://www.gbgb.org.uk/resultsMeeting.aspx?id=135488' 
html = urlopen(baseURL) 
bsObj = BeautifulSoup(html, 'lxml') 
nameList = bsObj.findAll("div", {"class": "resultsBlock"}) 
for i in nameList: 
    # just the trap info, the rest is similar 
    nameList2 = i.findAll("li", {"class": "trap"}) 
    for j in nameList2: 
     print(j.get_text()) 
+0

嗨ncfirth,非常感谢您的答复。尽管我遇到了一些问题。按照您的指示管理所有领域,但循环很少超出第七或第八场比赛(大多数会议12或14比赛),偶尔在最后一场比赛中,而不是6只狗,只有4或5个实际显示。还有一些会议,我不断收到“处理完成退出码0”。有时,如果我刷新这将消失,但在某些方面,我没有试过在第七或第八场比赛中寻找源代码的特性,但它们看起来都是一样的。 – moonshadow

+0

如果你有关于使用bs4的另一个问题,那么我建议你打开另一个问题,看来像错误可能比这个问题更复杂。 – ncfirth