使用scrapy获取“下一页”数据

我需要抓取商品网站的评论数据，但它的用户数据是分页的。每页评论有10条，约有100页。我如何抓取所有这些？使用scrapy获取“下一页”数据

My intention is to use the yield and Request method to crawl the "Next Page" link, and then using the Xpath to extract data. But I can't jump to the next page to extract the data.

这里是关于 “下一页” 链接的HTML代码：

<div class="xs-pagebar clearfix"> 
    <div class="Pagecon"> 
      <div class="Pagenum"> 
       <a class="pre-page pre-disable"> 
       <a class="pre-page pre-disable"> 
       <span class="curpage">1</span> 
       <a href="#" onclick="tosubmits(2):return false;">2</a> 
       <a href="#" onclick="tosubmits(3);return false;">3</a> 
       <span class="elli">...</span> 
       <a href="#" class="next-page" onclick="tosubmits('2');return false;">Next Page</a> 
       <a href="#" onclick="tosubmits('94');return false;">Final Page</a> 
      </div> 
    </div> 
</div>

是什么href="#"究竟意味着什么呢？

来源

2014-11-06 samlong

不幸的是，你不能用scrapy做到这一点。 href="#"是一个链接无处不在的锚链接（使其看起来像链接）。真正发生的是执行的javascript onclick处理程序。你将需要一个执行javascript的方法来为你的用例做这件事。你可能想看看Splinter来做到这一点。

来源

2014-11-06 14:17:27

谢谢你的解释。至于那，你是否知道任何其他方法来完成这项工作？我已经堆积了好几天了。 – samlong 2014-11-06 14:29:10

正如我所说，你可以使用分裂或查看铬开发工具，看看JavaScript调用什么：http://stackoverflow.com/questions/8550114/can-scrapy-be-used-to-scrape-dynamic- content-from-websites-that-are-using-ajax – 2014-11-06 14:44:25

非常感谢！通过使用分裂，我解决了这个问题！分裂是解决动态网页问题的有力工具，我非常喜欢它！ – samlong 2014-11-09 12:37:30

使用scrapy获取“下一页”数据

回答

相关问题