2015-03-25 122 views
0

我需要检索的是包含/questions/20702626/javac1-8-class-not-found的href。但输出我得到下面的代码是//stackoverflow.com从div标签检索第一个href

from bs4 import BeautifulSoup 
import urllib2 

url = "http://stackoverflow.com/search?q=incorrect+operator" 
content = urllib2.urlopen(url).read() 

soup = BeautifulSoup(content) 

for tag in soup.find_all('div'): 
    if tag.get("class")==['summary']: 
     for tag in soup.find_all('div'): 
      if tag.get("class")==['result-link']: 
       for link in soup.find_all('a'): 
         print link.get('href') 
        break; 

回答

1

而不是使嵌套循环,写CSS selector

for link in soup.select('div.summary div.result-link a'): 
    print link.get('href') 

这不仅是更具可读性,而且还解决您的问题。它打印:

/questions/11977228/incorrect-answer-in-operator-overloading 
/questions/8347592/sizeof-operator-returns-incorrect-size 
/questions/23984762/c-incorrect-signature-for-assignment-operator 
... 
/questions/24896659/incorrect-count-when-using-comparison-operator 
/questions/7035598/patter-checking-check-of-incorrect-number-of-operators-and-brackets 

附加说明:您可能要考虑使用StackExchange API而不是当前的网络抓取/ HTML的解析方法。