scrapy scraw html源代码

我正在使用scrapy来抓取和刮网站。我需要整个html而不是组件。我们可以使用xpath选择器轻松提取组件，但有没有任何方法可以为给定的类提取整个html块。例如在下面的html代码中，我需要整个div块prod-basic-info的确切html源代码。无论如何，我可以做到这一点？scrapy scraw html源代码

<div class="block prod-basic-info"> 
<h2>Product information</h2> 
<p class="product-info-label">Category</p> 
    <p> 
    <a href="xyz.html"</a> 
</p> 
</div>

来源

2015-02-09 sulav_lfc

只需将您的xpath表达或CSS选择的元素和extract()它：

response.xpath('//div[contains(@class, "prod-basic-info")]').extract()[0] 
response.css('div.prod-basic-info').extract()[0]

来源

2015-02-09 05:35:09 alecxe

scrapy scraw html源代码

回答

相关问题