0
解析我一直在使用Scrapy并试图遵循例子只能跟着网址匹配某种正则表达式的那个。Scrapy CrawlSpider - 不能按照特定的链接或自定义的处理器
我不是一个Python开发,但我已经尝试了很多方法,试图让这是怎么回事。
我在Scrapy文档中使用了示例URL,并且从CrawlSpider
延伸并通过LinkExtractor
实现了规则。
目前,我想只使用一个自定义的解析器对任何URL的包含在他们所说的“朋友”。
** Scrapy Python的蜘蛛**
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class MySpider(CrawlSpider):
name = 'example'
allowed_domains = ['quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com']
rules = [
Rule(LinkExtractor(allow='(friends)'), callback='parse_custom')
]
def parse(self, response):
self.logger.info('1111111111111 - Parsing General URL! %s', response.url)
for href in response.css('a::attr(href)'):
yield response.follow(href, callback=self.parse)
def parse_custom(self, response):
# I have never been able to get this to call
self.logger.info('2222222222222 - Parsing CUSTOM URL! %s', response.url)
for href in response.css('a::attr(href)'):
yield response.follow(href, callback=self.parse)
日志文件
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/miracles/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/miracle/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/live/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/life/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/inspirational/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/choices/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/abilities/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/simile/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/miracles/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/miracle/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/live/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/life/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/inspirational/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/choices/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/abilities/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/simile/
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/truth/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Marilyn-Monroe/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/friends/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/friendship/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/reading/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/books/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/humor/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Jane-Austen/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/truth/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/author/Marilyn-Monroe/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/friends/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/friendship/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/reading/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/books/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/humor/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/author/Jane-Austen/
谢谢,我甚至有一部分记得在阅读文档时阅读这些内容。 –