2015-03-03 73 views
1

我正在使用SeleniumScrapy要从动态网站中删除内容。我是新手到Selenium。我从here中提取酒单。该网站有一个show more按钮,点击后会显示更多葡萄酒列表。至于现在,我只能点击一下按钮并提取酒单。但我需要每次点击按钮,直到show more按钮不显示。任何对此的帮助将非常感激。这里是我的代码到目前为止:Selenium检查元素是否退出并单击

# -*- coding: utf-8 -*- 

from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.contrib.linkextractors import LinkExtractor 
from selenium import webdriver 
from scrapy.selector import Selector 
import time 




class WineSpider(CrawlSpider): 
    name = "wspider" 
    allowed_domains = ["vivino.com"] 



    start_urls = ["http://www.vivino.com/wineries/francis-ford-coppola/"] #hloru 
    def __init__(self): 
     self.driver = webdriver.Firefox() 

    def parse(self,response): 

     sel = Selector(self.driver.get(response.url)) 

     self.driver.get(response.url) 
     links = [] 

     time.sleep(5) 

     #this is for selecting the show more button 

     click = self.driver.find_elements_by_xpath("//*[@id='btn-more-wines']") 
     click[0].click() 
     time.sleep(5) 
     wines = self.driver.find_elements_by_xpath('//a[@class = "link-muted"]') 
     for w in wines: 
       links.append(w.get_attribute("href")) 



     print len(links) 
     self.driver.close() 

任何帮助将是非常有用的。

回答

-1

如果我是你,我会尝试做以下事情。 保持动作仿真,这是在一个单独的函数喜欢表演,更多的按钮,你的情况下点击,

def emulate_action(self): 
try: 
    click = self.driver.find_elements_by_xpath("//*[@id='btn-more-wines']") 
    click[0].click() 
    time.sleep(5.0) 
    return True 

except ElementNotVisibleException as e: 
    print " All elements displayed" 
    return False 

然后调用它,直到所有的葡萄酒已经被加载,

while 1: 

flag = self.emulate_action() 

if (res): 
    continue 
else: 
    break 

然后这段代码将会在那之后应该有希望解决你的问题,如果我没有错的话。

wines = self.driver.find_elements_by_xpath('//a[@class = "link-muted"]') 
    for w in wines: 
      links.append(w.get_attribute("href")) 



    print len(links) 
    self.driver.close() 

让我知道这种方法是否适合您!

3

做一个死循环,使用Explicit Wait等待“显示更多”按钮出现,打破了一次循环“显示更多”不再可见(左没有更多的葡萄酒) - 只有解析结果:

from selenium import webdriver 
from selenium.common.exceptions import TimeoutException 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 


driver = webdriver.Firefox() 
driver.get("http://www.vivino.com/wineries/francis-ford-coppola/") 

while True: 
    try: 
     button = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.ID, "btn-more-wines"))) 
    except TimeoutException: 
     break # no more wines 

    button.click() # load more wines 


wines = driver.find_elements_by_xpath('//a[@class = "link-muted"]') 

links = [w.get_attribute("href") for w in wines] 

driver.close() 

请注意,显式等待实际上是一个游戏转换器 - 它会使您的代码更加可靠和快速,而不是硬编码time.sleep延迟。