2015-10-15 58 views
1

我试图抓取一个网页,但我无法获取使用硒的网站的html文本。BeautifulSoup不会使用硒获取页面源

这里是我到目前为止的代码

from selenium import webdriver 
from selenium.webdriver.common.keys import Keys 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from bs4 import BeautifulSoup 
import urlparse 

search_term = raw_input("What is your search term?: ") 
url = "https://www.google.co.uk/search?client=ubuntu&channel=fs&q=" 
googurl = url+search_term 
driver = webdriver.Firefox() 

htmltext = driver.get(googurl) 
soup = BeautifulSoup(htmltext.page_source) 

这样做我得到的回溯

What is your search term?: hi 
Traceback (most recent call last): 
    File "google page click.py", line 15, in <module> 
    soup = BeautifulSoup(htmltext.page_source) 
AttributeError: 'NoneType' object has no attribute 'page_source' 

回答

1

始终要使用的驱动程序对象:

driver.get(googurl) 
soup = BeautifulSoup(driver.page_source) 
+0

三江源这一点,它的现在工作。 – booberz