2015-02-08 44 views
1

我正在从网站上刮取数据(http://sports.yahoo.com/nfl/players/8800/),为此我使用urllib2和BeautifulSoup。我此刻的代码如下所示:迭代为一个漂亮的结果集python

site= 'http://sports.yahoo.com/nfl/players/8800/' 
response = urllib2.urlopen(site) 
html = response.read() 
soup = BeautifulSoup(html) 
rushing=[] 
passing=[] 
receiving=[] 

#here is where my problem arises 
for elem in soup.find_all('th', text=re.compile('2008')): 
     passing = elem.parent.find_all('td', class_=re.compile('10')) 
     rushing = elem.parent.find_all('td', class_=re.compile('20')) 
     receiving = elem.parent.find_all('td', class_=re.compile('30')) 

有三种情况,其中soup.find_all(...“2008”))存在此页面上部分,每这些的时候转动起来部分,是分开印刷。然而,运行这个for循环只运行一次循环。我如何确保循环运行三次?

回答

1

据我了解,你需要extend()你循环之前已经定义列表:

rushing = [] 
passing = [] 
receiving = [] 

for elem in soup.find_all('th', text=re.compile('2008')): 
    passing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('10'))]) 
    rushing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('20'))]) 
    receiving.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('30'))]) 

print passing 
print rushing 
print receiving 

打印:

[u'3'] 
[u'19', u'58', u'14.5', u'3.1', u'0'] 
[u'2', u'17', u'4.3', u'8.5', u'11', u'6.5', u'0']