Python Scrapy函数调用

我尝试从主分析函数中调用getNext（）函数，该函数使用分段调用但它永远不会被调用。Python Scrapy函数调用

class BlogSpider(scrapy.Spider): 
     # User agent. 
     name = 'Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19' 
     start_urls = ['http://www.tricksforums.org/best-free-movie-streaming-sites-to/'] 

     def getNext(self): 
     print("Getting next ... ") 
     # Check if next link in DB is valid and crawl. 
     try: 
      nextUrl = myDb.getNextUrl() 
      urllib.urlopen(nextUrl).getcode() 
      yield scrapy.Request(nextUrl['link']) 
     except IOError as e: 
      print("Server can't be reached", e.code) 
      yield self.getNext() 

     def parse(self, response): 
     print("Parsing link: ", response.url) 
     # Get all urls for futher crawling. 
     all_links = hxs.xpath('*//a/@href').extract() 
     for link in all_links: 
      if validators.url(link) and not myDb.existUrl(link) and not myDb.visited(link): 
      myDb.addUrl(link) 
     print("Getting next?") 
     yield self.getNext()

我尝试过和没有屈服之前..有什么问题？这个产量应该是什么？ :)

来源

2017-06-19 Alessandro

你在控制台上打印什么？ – alecxe

'（'Parsing link：'，'http://www.tricksforums.org/best-free-movie-streaming-sites-to/'）下一步是什么？'这就是我得到的:) – Alessandro

所以，你呢请参阅“下一步”打印......这意味着执行getNext（），对吧？谢谢。 – alecxe

您试图产生一个发电机，但意味着从发电机收益率。

如果您对Python的3.3+，你可以使用yield from：

yield from self.getNext()

或者，干脆做return self.getNext()。

来源

2017-06-19 19:38:03 alecxe

是的，工作:)。但我仍然没有得到良好的处理.. – Alessandro

@Alessandro你应该也已经注意到在控制台上的信息：'2017-06-19 15:42:49 [scrapy.core.scraper]错误：蜘蛛必须返回Request，BaseItem，dict或None，在中获得'generator' - 请查看[this SO topic]（https ：//stackoverflow.com/q/1756096/771848）了解生成器。谢谢！ – alecxe

我有“--nolog”标志..是的 – Alessandro

Python Scrapy函数调用

回答

相关问题