Scrapy CrawlSpider - 不能按照特定的链接或自定义的处理器

解析我一直在使用Scrapy并试图遵循例子只能跟着网址匹配某种正则表达式的那个。Scrapy CrawlSpider - 不能按照特定的链接或自定义的处理器

我不是一个Python开发，但我已经尝试了很多方法，试图让这是怎么回事。

我在Scrapy文档中使用了示例URL，并且从CrawlSpider延伸并通过LinkExtractor实现了规则。

目前，我想只使用一个自定义的解析器对任何URL的包含在他们所说的“朋友”。

** Scrapy Python的蜘蛛**

import scrapy 

from scrapy.spiders import CrawlSpider, Rule 
from scrapy.linkextractors import LinkExtractor 

class MySpider(CrawlSpider): 
    name = 'example' 

    allowed_domains = ['quotes.toscrape.com'] 

    start_urls = ['http://quotes.toscrape.com'] 

    rules = [ 
     Rule(LinkExtractor(allow='(friends)'), callback='parse_custom') 
    ] 

    def parse(self, response): 

     self.logger.info('1111111111111 - Parsing General URL! %s', response.url) 

     for href in response.css('a::attr(href)'): 
      yield response.follow(href, callback=self.parse) 

    def parse_custom(self, response): 
     # I have never been able to get this to call 
     self.logger.info('2222222222222 - Parsing CUSTOM URL! %s', response.url) 

     for href in response.css('a::attr(href)'): 
      yield response.follow(href, callback=self.parse)

日志文件

2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/miracles/page/1/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/miracle/page/1/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/live/page/1/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/life/page/1/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/inspirational/page/1/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/choices/page/1/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/abilities/page/1/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/simile/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/miracles/page/1/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/miracle/page/1/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/live/page/1/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/life/page/1/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/inspirational/page/1/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/choices/page/1/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/abilities/page/1/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/simile/ 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/truth/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Marilyn-Monroe/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/friends/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/friendship/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/reading/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/books/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/humor/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Jane-Austen/> (referer: http://quotes.toscrape.com) 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/truth/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/author/Marilyn-Monroe/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/friends/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/friendship/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/reading/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/books/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/humor/ 
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/author/Jane-Austen/

来源

2017-07-30 David Cruwys

从documentation：

当编写蜘蛛抓取规则，避免使用parse回调，由于CrawlSpider使用parse方法本身来实现它的逻辑。因此，如果您覆盖parse方法，抓取蜘蛛将不再起作用。

来源

2017-07-30 12:04:12

谢谢，我甚至有一部分记得在阅读文档时阅读这些内容。 –

Scrapy CrawlSpider - 不能按照特定的链接或自定义的处理器

回答

相关问题