提取的XPath

的.jpg我试图从下面的链接JPG格式图片的链接：https://asheville.craigslist.org/search/sss 提取的XPath

如果你看看嵌套的节点，有同我需要提取的链接节点。

我是新来scrapy和XPath，我似乎无法得到任何东西比一个空列表，返回等。

我试过很多品种这段代码没有任何的运气：

response.xpath('//*[@id="sortable-results"]/ul/li/a/img/')

来源

2017-04-20 Keenan Burke-Pitts

分享您的当前代码 – Andersson

参见上面的代码我一直在尝试。谢谢！ –

尝试实施以下XPath表达获得的图像源链接：

//div[@id="sortable-results"]//img/@src

来源

2017-04-20 14:24:41 Andersson

当我使用response.xpath（'// div [@ id =“sortable-results”] // img/@ src'）时仍然返回一个空的列表 –

这是因为所需内容是动态的 - 它是由' JavaScript' ...但'XPath'是正确的:) – Andersson

感谢您的澄清！ –

好像数据被隐藏在<a>节点data-ids属性，后来由JavaScript解压到图像的画廊。

<a href="/cto/6095960745.html" class="result-image gallery" 
data-ids="1:01414_7WJQELsYuex,1:00t0t_kxF99J8uXmP,1:00S0S_dgnLA6FvDKX,1:00404_kTP1mB2Flpb,1:00P0P_j5On1SCHLuP,1:00a0a_jZYNazvdTgo,1:00Y0Y_9HJf6UJJVg7,1:00p0p_loCrLMXpS5s,1:00k0k_3e296xxBfXi,1:00f0f_5QpRYaBnIK7,1:00e0e_aZTOihYtz9C,1:00c0c_iatoB70CmWg,1:00X0X_dwt0ZbxYJNC,1:00k0k_k3dPBZpN9KM,1:00W0W_f51jQcPO86R">\n 
<span class="result-price">$1700</span>\n  </a>

我们可以扭转通过提取ID，然后格式化自己的图片网址工程师这样的：

ids = response.xpath("//a[@class='result-image gallery']/@data-ids").extract() 
ids = ''.join(ids).split(',') # all of ids are separeted by comma 
template = "https://images.craigslist.org/{}_300x300.jpg" 
for img_id in ids: 
    # e.g. 1:00G0G_anZn4IdI4pK' 
    # we want to get rid of 1: part 
    img_id = img_id.split(':')[-1] 
    url = template.format(image id) 
    print(url)

来源

2017-04-20 06:08:27 Granitosaurus

感谢您的回复。我需要提取嵌套在节点内的节点中包含的.jpg超链接。 –

提取的XPath

回答

相关问题