Python，Scrapy，Pipeline：函数“process_item”没有被调用

我有一个非常简单的代码，如下所示。刮可以，我可以看到所有print报表生成正确的数据。在Pipeline中，初始化工作正常。但是，process_item函数没有被调用，因为print语句在函数的开头永远不会执行。Python，Scrapy，Pipeline：函数“process_item”没有被调用

蜘蛛：comosham.py

import scrapy 
from scrapy.spider import Spider 
from scrapy.selector import Selector 
from scrapy.http import Request 
from activityadvisor.items import ComoShamLocation 
from activityadvisor.items import ComoShamActivity 
from activityadvisor.items import ComoShamRates 
import re 


class ComoSham(Spider): 
    name = "comosham" 
    allowed_domains = ["www.comoshambhala.com"] 
    start_urls = [ 
     "http://www.comoshambhala.com/singapore/classes/schedules", 
     "http://www.comoshambhala.com/singapore/about/location-contact", 
     "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes", 
     "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes" 
    ] 

    def parse(self, response): 
     category = (response.url)[39:44] 
     print 'in parse' 
     if category == 'class': 
      pass 
      """self.gen_req_class(response)""" 
     elif category == 'about': 
      print 'about to call parse_location' 
      self.parse_location(response) 
     elif category == 'rates': 
      pass 
      """self.parse_rates(response)""" 
     else: 
      print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D' 


    def parse_location(self, response): 
     print 'in parse_location'  
     item = ComoShamLocation() 
     item['category'] = 'location' 
     loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract() 
     item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11] 
     item['pin'] = (loc[5])[11:18] 
     item['phone'] = (loc[9])[6:20] 
     item['fax'] = (loc[10])[6:20] 
     item['email'] = loc[12] 
     print item['address'],item['pin'],item['phone'],item['fax'],item['email'] 
     return item

项目文件：

import scrapy 
from scrapy.item import Item, Field 

class ComoShamLocation(Item): 
    address = Field() 
    pin = Field() 
    phone = Field() 
    fax = Field() 
    email = Field() 
    category = Field()

管线档案：

class ComoShamPipeline(object): 
    def __init__(self): 
     self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb')) 
     self.locationdump.writerow(['Address','Pin','Phone','Fax','Email']) 


    def process_item(self,item,spider): 
     print 'processing item now' 
     if item['category'] == 'location': 
      print item['address'],item['pin'],item['phone'],item['fax'],item['email'] 
      self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']]) 
     else: 
      pass

来源

2015-07-10 Tuhina Singh

是否在'parse_location'函数末尾生成了一个项目并且具有它的值？ – GHajba

是的，在'parse_location'的末尾，我正在打印它并且输出如预期。 –

我想你有，但我必须问它：你在'settings.py'中配置了ItemPipeline吗？ – GHajba

您的问题是，你从来没有真正屈服的项目。 parse_location返回一个要解析的项目，但解析永远不会产生该项目。

解决办法是更换：

self.parse_location(response)

与

yield self.parse_location(response)

更具体地说，如果没有项目取得process_item不会被调用。

来源

2015-07-13 15:34:48 rocktheartsm4l

使用ITEM_PIPELINES在settings.py：

ITEM_PIPELINES = ['project_name.pipelines.pipeline_class']

来源

2015-12-23 06:13:41 Ganesh

添加到上述问题的答案，
1.记住添加以下行settings.py中！ ITEM_PIPELINES = {'[YOUR_PROJECT_NAME].pipelines.[YOUR_PIPELINE_CLASS]': 300} 2.当你的蜘蛛运行时产生物品！

来源

2017-11-04 10:04:01 atb00ker

将['YOUR_PROJECT_NAME]更正为“[YOUR_PROJECT_NAME]” –

这解决了我的问题：我删除所有的项目我的管道被调用之前，所以process_item（）没有得到调用，但open_spider和close_spider是被调用。因此，tmy解决方案只是改变命令，以便在丢弃项目的其他管道之前使用此管道。

Scrapy Pipeline Documentation.

只要记住，Scrapy调用Pipeline.process_item（）仅当有要处理的项目！

来源

2017-11-07 10:22:48

Python，Scrapy，Pipeline：函数“process_item”没有被调用

回答

相关问题