2017-10-09 102 views
1

我使用BeautifulSoup从随机网站提取数据。我试图找到类名为simpleList的所有div标签。但数据没有被收集。它只是显示一个空的列表。BeautifulSoup.find_all没有检索网页的元素

   
 
     </div> 
 
     <div class="clear"></div> 
 
     <div id="locationSearchResults" class="simpleList"> 
 
       
 
       
 
       <div class="result "> 
 
        <span class="cell cellBorder normalWidth" onclick="document.location='/real-estate/rock-spring-ga/LCGAROCKSPRING/';"> 
 
        <a onclick="Track.doEvent('Location Search Results', 'Select Listings', 'Rock Spring, GA');" tabindex="2" title="Listings in Rock Spring, GA" class="suggestCollapse" href="/real-estate/rock-spring-ga/LCGAROCKSPRING/"><b>Rock Spring, GA</b></a> 
 
        </span> 
 
        
 
        
 
        <span class="cell cellBorder normalWidth"><a onclick="Track.doEvent('Location Search Results', 'Select Homes for Sale', 'Rock Spring, GA');" title="Homes for Sale in Rock Spring, GA" href="/real-estate/rock-spring-ga/LCGAROCKSPRING/">56 Listings</a></span> 
 
        
 
        
 
        
 
        
 
        
 
        <span class="cell cellBorder normalWidth disabled"><a onclick="return false;" title="Rentals in Rock Spring, GA" href="/real-estate/rock-spring-ga/LCGAROCKSPRING/?ty=3">0 Rentals</a></span> 
 
        
 
        
 
        
 
        
 
        <span class="cell cellBorder normalWidth disabled"><a onclick="return false;" title="Agents in Rock Spring, GA" href="/real-estate-agents/rock-spring-ga/LCGAROCKSPRING/">0 Agents</a></span> 
 
       

import requests 
from bs4 import BeautifulSoup 
r=requests.get("http://www.century21.com/locationsearch.c21? 
q=Rock+Spring&v=0#r=10&l=Rock+Spring&c=1") 
c=r.content 
soup=BeautifulSoup(c,"html.parser") 
print(soup) 
all=soup.find_all("div",{"class":"simpleList"}) 
print(all) 

会是什么错误?

回答

0

问题是与你使用的HTML解析器。使用lxmlhtml5lib

我用html5lib和它工作得很好:

all = soup.find('div', {'class': 'simpleList'}).findAll('div') 
print(len(all)) 

它给了我12

编辑:

下表总结了每个解析库的优点和缺点:

HTML Parsers

来源:https://www.crummy.com/software/BeautifulSoup/bs4/doc/

0

试试这个:

from bs4 import BeautifulSoup 
from urllib.request import urlopen 

html = urlopen("http://your_site.com") 
soup = BeautifulSoup(html, 'lxml) 
all = soup.find_all('div', {'class': 'simpleList'}) 
print(all)