使xpath更具选择性？ [网络刮]

我想打印一些房价，并且在使用Xpath时遇到了麻烦。这里是我的代码：使xpath更具选择性？ [网络刮]

from selenium import webdriver 
driver = webdriver.Chrome("my/path/here") 

driver.get("https://www.realtor.com/realestateandhomes-search/?pgsz=10") 
for house_number in range(1,11): 
    try: 
     price = driver.find_element_by_xpath("""//*[@id=" 
{}"]/div[2]/div[1]""".format(house_number)) 
     print(price.text) 
    except: 
     print('couldnt find')

我在this网站，试图打印关闭前十楼的房子的房价。

我的输出是对于所有说“新”的房屋，这个价格取代实际价格。但是对于没有新贴纸的最下面两个，记录了实际价格。

如何制作我的Xpath选择器，以便它选择数字而不是NEW？

来源

2017-10-05 thewhitetie

您正处于正确的轨道上，您刚刚制作了一个太脆弱的XPath。我会尽量让它更加冗长，而不依赖于索引和通配符。

这是你的XPath（我用id="1"例如用途）：

//*[@id="1"]/div[2]/div[1]

而这里的HTML（一些属性/元素简洁，删除）：

<li id="1"> 
    <div></div> 
    <div class="srp-item-body"> 
     <div>New</div><!-- this is optional! --> 
     <div class="srp-item-price">$100,000</div> 
    </div> 
</li>

首先，将*通配符替换为您期望包含的元素。这只是作为一种方法来帮助“自文档”中的XPath更好一点：

//li[@id="1"]/div[2]/div[1]

接下来，你要定位的第二<div>，但不是通过索引搜索，尝试使用元素的属性如果适用，如class：

//li[@id="1"]/div[@class="srp-item-body"]/div[1]

最后，你要定位的<div>的价格。由于“新”文本位于其自己的<div>中，因此您的XPath将目标定位为第一个<div>（“新”），而不是价格为<div>。如果“新”文本<div>不存在，那么您的XPath确实有效。

我们可以使用与上一步类似的方法，通过属性进行定位。这迫使XPath来始终瞄准<div>的价格：

//li[@id="1"]/div[@class="srp-item-body"]/div[@class="srp-item-price"]

希望这有助于！

所以......话说这一切，如果你是在价格，没有别的有兴趣，这很可能也是工作:)

for price in driver.find_elements_by_class_name('srp-item-price'): 
    print(price.text)

来源

2017-10-05 23:31:22

嗨，感谢您的努力，我欣赏评论和深思熟虑的解释。但是，当我尝试运行该代码时，我现在发现一个错误，即Selenium根本找不到该元素（即，对于任何房屋）！我将我的代码更改为： 'price = driver.get_element_by_xclass（“”“// li [@id =”{}“]/div [@ class =”srp-item-body“]/div [@class =“srp-item-price”]“”“。format（house_number））'\t 而这会抛出一个异常，即每次都无法找到元素。 – thewhitetie

它在我的Chrome控制台中工作，你尝试使用'driver.find_element_by_xpath'吗？ –

你可以把它写这样无需加载图像，它可以增加你的抓取速度

from selenium import webdriver 
# Unloaded image 
chrome_opt = webdriver.ChromeOptions() 
prefs = {"profile.managed_default_content_settings.images": 2} 
chrome_opt.add_experimental_option("prefs", prefs) 
driver = webdriver.Chrome(chrome_options=chrome_opt,executable_path="my/path/here") 
driver.get("https://www.realtor.com/realestateandhomes-search/Bladen-County_NC/sby-6/pg-1?pgsz=10") 
for house_number in range(1,11): 
    try: 
     price = driver.find_element_by_xpath('//*[@id="{}"]/div[2]/div[@class="srp-item-price"]'.format(house_number)) 
     print(price.text) 
    except: 
     print('couldnt find')

来源

2017-10-06 02:45:00 kerberos

我发现了与上述相同的解决方案。 – Sagar007

你可以试试这个代码：

from selenium import webdriver 
driver = webdriver.Chrome() 
driver.maximize_window() 
driver.get("https://www.realtor.com/realestateandhomes-search/Bladen-County_NC/sby-6/pg-1?pgsz=10") 

prices=driver.find_elements_by_xpath('//*[@class="data-price-display"]') 

for price in prices: 
    print(price.text)

它将打印

$39,900 
$86,500 
$39,500 
$40,000 
$179,000 
$31,000 
$104,900 
$94,900 
$54,900 
$19,900

不要让我知道是否还需要其他细节

来源

2017-10-06 06:49:09 thebadguy

使xpath更具选择性？ [网络刮]

回答

相关问题