2016-03-01 179 views
1

我在搜索房地产数据。在用JavaScript硒做了出色的工作产生的站点:你发现有Python - Selenium:在Find_elements_by()上搜索带有循环的AngularJS元素

driver.find_elements_by... 

缓缴全部的相关信息,并循环的标签,但在这site,该列表按角JS生产。我尝试了同样的方法:

for article in driver.find_elements_by_css_selector("div.property.ng-scope"): 
    do something 

我想通了,我必须让我的webdriver(phantomJS)单击通向单独列表网站的链接:

linkbase = article.find_element_by_css_selector("div.info.clear.ng-scope") 
link = linkbase.find_element_by_tag_name('a') 
link.click() 

然后webdriver的仅仅是指出对该网站,我可以得到我想要的所有信息一个清单

只要通过一个运行结束,我得到以下错误:

> Message: {"errorMessage":"Element does not exist in cache","request":{"headers": 
{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close"," 
Content-Length":"142","Content-Type":"application/json;charset=UTF-8","Host":"12 
7.0.0.1:56577","User-Agent":"Python-urllib/3.4"},"httpVersion":"1.1","method":"P 
OST","post":"{\"sessionId\": \"f9ec2c10-dfd9-11e5-9d4c-3bbe8f5bf7c0\", \"using\" 
: \"css selector\", \"id\": \":wdc:1456856343349\", \"value\": \"div.info.clear. 
ng-scope\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"elemen 
t","directory":"/","path":"/element","relative":"/element","port":"","host":""," 
password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/ele 
ment","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/f9ec2c10-dfd9- 
11e5-9d4c-3bbe8f5bf7c0/element/:wdc:1456856343349/element"}} 

包含页面上的链接的元素是:

<a ng-href="/detail/prodej/dum/rodinny/jemnice-jemnice-/3800125532" ng-click="beforeOpen(i.iterator, i.regionTip)" class="title" href="/detail/prodej/dum/rodinny/jemnice-jemnice-/3800125532"> 
<span class="name ng-binding"> ... </a> 

这仅仅是标题文字的每个列表。我确实在this answer之后设置了用户代理,即使它没有出现在错误中。此外,我等待周围的元素加载之前:

wait = WebDriverWait(driver, getSearchResults_CZ.waiting) 
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.content"))) 

我要的是分析所有这些属性元素,通过列表的链接保存到一个列表,然后循环,打开每个环节与driver.get( )我知道,通过点击链接,驱动程序的网址发生了变化,但我认为一旦文章列表已经建立了find_elements_by,它将作为一个稳定的参考点。通过搜索“a”标签访问链接,并调用get_attribute('href')在这种情况下无法使用角度js框架。我没有看到什么?

编辑: 如回答,没有.click()的get_attribute是正确的路要走。我原来的错误与CSS选择器有关:我一直在使用“div [class^='property']”并得到了完全不同的链接。必须找到我以前从未见过的另一个元素。

回答

1

等待至少一个“属性” 可见然后抢链接:

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 

driver = webdriver.Firefox() 
driver.get("http://www.sreality.cz/hledani/prodej/domy?region=jemnice") 
driver.maximize_window() 

wait = WebDriverWait(driver, 10) 
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "property"))) 

links = [link.get_attribute("href") for link in driver.find_elements_by_css_selector("div.property div.info a")] 
print(links) 

driver.close() 

为我工作。

+0

正如它对我来说......不是点击是正确的路要走。否则Selenium会丢失它应该循环的webobjects。 – Thanados