2017-03-07 101 views
2

我试图用Selenium生成一个URL列表。 我希望用户浏览检测过的浏览器并最终创建他访问的URL列表。使用Selenium生成一个URL列表

我发现属性“current_url”可以帮助做到这一点,但我没有找到一种方法来知道用户点击了一个链接。

In [117]: from selenium import webdriver 

In [118]: browser = webdriver.Chrome() 

In [119]: browser.get("http://stackoverflow.com") 

--> here, I click on the "Questions" link. 

In [120]: browser.current_url 

Out[120]: 'http://stackoverflow.com/questions' 

--> here, I click on the "Jobs" link. 

In [121]: browser.current_url 

Out[121]: 'http://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab' 

任何提示赞赏!

谢谢

回答

2

是不是真的要监视的用户在硒做一个正式的方式。你唯一能做的就是启动驱动程序,然后运行一个不断检查driver.current_url的循环。但是,我不知道退出这个循环的最佳方法是什么,因为我不知道你的用法是什么。也许你可以试试:

from selenium import webdriver 


urls = [] 

driver = webdriver.Firefox() 

current = 'http://www.google.com' 
driver.get('http://www.google.com') 
while True: 
    if driver.current_url != current: 
     current = driver.current_url 

     # if you want to capture every URL, including duplicates: 
     urls.append(current) 

     # or if you only want to capture unique URLs: 
     if current not in urls: 
      urls.append(current) 

如果你没有对如何结束这个循环什么想法,我建议要么将用户导航到一个URL,将打破循环,如http://www.endseleniumcheck.com,并将其添加代码如下:

from selenium import webdriver 


urls = [] 

driver = webdriver.Firefox() 

current = 'http://www.google.com' 
driver.get('http://www.google.com') 
while True: 
    if driver.current_url == 'http://www.endseleniumcheck.com': 
     break 

    if driver.current_url != current: 
     current = driver.current_url 

     # if you want to capture every URL, including duplicates: 
     urls.append(current) 

     # or if you only want to capture unique URLs: 
     if current not in urls: 
      urls.append(current) 

或者,如果你想得到狡猾,你可以在用户退出浏览器时终止循环。您可以通过与psutil库(pip install psutil)监测的进程ID做到这一点:

from selenium import webdriver 
import psutil 


urls = [] 

driver = webdriver.Firefox() 
pid = driver.binary.process.pid 

current = 'http://www.google.com' 
driver.get('http://www.google.com') 
while True: 
    if pid not in psutil.pids(): 
     break 

    if driver.current_url != current: 
     current = driver.current_url 

     # if you want to capture every URL, including duplicates: 
     urls.append(current) 

     # or if you only want to capture unique URLs: 
     if current not in urls: 
      urls.append(current) 
+0

非常感谢您!它会做的。就我个人而言,我最终使用了try/catch结构来处理浏览器出口(抛出异常)。这不是干净的,但足够我所要做的。 – reike