2017-06-19 127 views
2

我尝试从主分析函数中调用getNext()函数,该函数使用分段调用但它永远不会被调用。Python Scrapy函数调用

class BlogSpider(scrapy.Spider): 
     # User agent. 
     name = 'Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19' 
     start_urls = ['http://www.tricksforums.org/best-free-movie-streaming-sites-to/'] 

     def getNext(self): 
     print("Getting next ... ") 
     # Check if next link in DB is valid and crawl. 
     try: 
      nextUrl = myDb.getNextUrl() 
      urllib.urlopen(nextUrl).getcode() 
      yield scrapy.Request(nextUrl['link']) 
     except IOError as e: 
      print("Server can't be reached", e.code) 
      yield self.getNext() 

     def parse(self, response): 
     print("Parsing link: ", response.url) 
     # Get all urls for futher crawling. 
     all_links = hxs.xpath('*//a/@href').extract() 
     for link in all_links: 
      if validators.url(link) and not myDb.existUrl(link) and not myDb.visited(link): 
      myDb.addUrl(link) 
     print("Getting next?") 
     yield self.getNext() 

我尝试过和没有屈服之前..有什么问题?这个产量应该是什么? :)

+0

你在控制台上打印什么? – alecxe

+0

'('Parsing link:','http://www.tricksforums.org/best-free-movie-streaming-sites-to/') 下一步是什么?'这就是我得到的:) – Alessandro

+0

所以,你呢请参阅“下一步”打印......这意味着执行getNext(),对吧?谢谢。 – alecxe

回答

1

您试图产生一个发电机,但意味着从发电机收益率

如果您对Python的3.3+,你可以使用yield from

yield from self.getNext() 

或者,干脆做return self.getNext()

+0

是的,工作:)。但我仍然没有得到良好的处理.. – Alessandro

+1

@Alessandro你应该也已经注意到在控制台上的信息:'2017-06-19 15:42:49 [scrapy.core.scraper]错误:蜘蛛必须返回Request,BaseItem,dict或None,在中获得'generator' - 请查看[this SO topic](https ://stackoverflow.com/q/1756096/771848)了解生成器。谢谢! – alecxe

+1

我有“--nolog”标志..是的 – Alessandro