2016-02-26 53 views
0

我想刮英国广播公司足球结果网站,以获得球队,投篮,进球,卡牌和事件。我目前有3个团队数据传递到URL中。通过报废的数据循环并输出结果

我用Python写的脚本,并使用美丽的汤bs4包。当输出结果到屏幕时,第一队将被打印,第一和第二队,然后是第一,第二和第三队。因此,第一支球队有效地打印了3次,当我试图让3支球队只有一次。

一旦我有这个问题排序,我会写结果到文件。我将团队数据添加到数据框中,然后添加到列表中(我不确定这是否是最好的方法)。 我确定如果是与for循环有关,但我不确定如何解决问题。 代码:

from bs4 import BeautifulSoup 
import urllib2 
import pandas as pd 


out_list = [] 
for numb in('EFBO839787', 'EFBO839786', 'EFBO815155'): 

url = 'http://www.bbc.co.uk/sport/football/result/partial/' + numb + '?teamview=false' 
teams_list = [] 
inner_page = urllib2.urlopen(url).read() 
soupb = BeautifulSoup(inner_page, 'lxml') 

for report in soupb.find_all('td', 'match-details'): 
      home_tag = report.find('span', class_='team-home') 
      home_team = home_tag and ''.join(home_tag.stripped_strings) 

      score_tag = report.find('span', class_='score') 
      score = score_tag and ''.join(score_tag.stripped_strings) 

      shots_tag = report.find('span', class_='shots-on-target') 
      shots = shots_tag and ''.join(shots_tag.stripped_strings) 

      away_tag = report.find('span', class_='team-away') 
      away_team = away_tag and ''.join(away_tag.stripped_strings) 

      df = pd.DataFrame({'away_team' : [away_team], 'home_team' : [home_team], 'score' : [score], }) 
      out_list.append(df) 

for shots in soupb.find_all('td', class_='shots'): 

       home_shots_tag = shots.find('span',class_='goal-count-home') 
       home_shots = home_shots_tag and ''.join(home_shots_tag.stripped_strings) 

       away_shots_tag = shots.find('span',class_='goal-count-away') 
       away_shots = away_shots_tag and ''.join(away_shots_tag.stripped_strings) 

       dfb = pd.DataFrame({'home_shots': [home_shots], 'away_shots' : [away_shots] }) 
       out_list.append(dfb) 

for incidents in soupb.find("table", class_="incidents-table").find("tbody").find_all("tr"): 

        home_inc_tag = incidents.find("td", class_="incident-player-home") 
        home_inc = home_inc_tag and ''.join(home_inc_tag.stripped_strings) 

        type_inc_goal_tag = incidents.find("td", "span", class_="incident-type goal") 
        type_inc_goal = type_inc_goal_tag and ''.join(type_inc_goal_tag.stripped_strings) 

        type_inc_tag = incidents.find("td", class_="incident-type") 
        type_inc = type_inc_tag and ''.join(type_inc_tag.stripped_strings) 

        time_inc_tag = incidents.find('td', class_='incident-time') 
        time_inc = time_inc_tag and ''.join(time_inc_tag.stripped_strings) 

        away_inc_tag = incidents.find('td', class_='incident-player-away') 
        away_inc = away_inc_tag and ''.join(away_inc_tag.stripped_strings) 

        df_incidents = pd.DataFrame({'home_player' : [home_inc],'event_type' : [type_inc_goal],'event_time': [time_inc],'away_player' : [away_inc]}) 

        out_list.append(df_incidents) 


print "end" 

print out_list 

我是新来的Python和堆栈溢出,格式化我的问题有什么建议也很有用。

在此先感谢!

+0

您的缩进已关闭,因此循环无法正确对齐。请修复它。我还建议你阅读[PEP8](https://www.python.org/dev/peps/pep-0008/) – ffledgling

+0

这看起来像一个打印问题,在什么样的缩进级别打印'out_list' ?它应该在*零*缩进处,一直在代码的左边。要么是,要么将out_list *移动到循环的最顶部,以便在每次迭代之后重新分配循环。 – ffledgling

+0

谢谢@ffledgling这是问题,我是新的python,并不明白如何工作。谢谢 – paulg

回答

0

这看起来像一个打印问题,在什么缩进级别你印刷out_list?

它应该是在零压痕,一路在你的代码的左侧。

要么,你要移动out_list到最顶端的循环,这样,它的每一次迭代之后重新分配。

1

这3个for循环应该在你的main for循环中。

out_list = [] 
for numb in('EFBO839787', 'EFBO839786', 'EFBO815155'): 
    url = 'http://www.bbc.co.uk/sport/football/result/partial/' + numb + '?teamview=false' 
    teams_list = [] 
    inner_page = urllib.request.urlopen(url).read() 
    soupb = BeautifulSoup(inner_page, 'lxml') 

    for report in soupb.find_all('td', 'match-details'): 
       # your code as it is 

    for shots in soupb.find_all('td', class_='shots'): 
       # your code as it is 

    for incidents in soupb.find("table", class_="incidents-table").find("tbody").find_all("tr"): 
       # your code as it is 

它工作得很好 - 只出现一次团队。

这里是第一的输出循环:

[{'score': ['1-3'], 'away_team': ['Man City'], 'home_team': ['Dynamo Kiev']}, 
{'score': ['1-0'], 'away_team': ['Zenit St P'], 'home_team': ['Benfica']}, 
{'score': ['1-2'], 'away_team': ['Boston United'], 'home_team': ['Bradford Park Avenue']}]