Scrapy：失踪的XPath项目导致错误的数据被写入到我的管道

这是一个新手的问题（新Scrapy和＃2第一个问题）：Scrapy：失踪的XPath项目导致错误的数据被写入到我的管道

我现在有一个蜘蛛抓取以下亚马逊页面（http://www.amazon.co.uk/Televisions-TVs-LED-LCD-Plasma/b/ref=sn_gfs_co_auto_560864_1?ie=UTF8&node=560864 ）。

我想刮电视和主要（上市）价格的标题。我可以成功解析电视名称。然而，在一些列出的亚马逊电视中，它们并不都具有相同的Xpath元素;有些主要（上市）价格，有些有“新”价格，有些还有“按旧”价格。

我的问题是，当电视机没有主（列出）价格时，我的CSV输出不会为该项目记录NULL，而是会占用具有主要价格的下一个XPATH项目。

有没有办法检查一个项目是否存在于XPATH内容中，如果不是让蜘蛛或管道记录NULL或“”？

我主要蜘蛛的代码是：

class AmazonSpider(BaseSpider): 
    name = "amazon" 
    allowed_domains = ["amazon.co.uk"] 
    start_urls = [ 
    "http://www.amazon.co.uk/Televisions-TVs-LED-LCD-Plasma /b/ref=sn_gfs_co_auto_560864_1?ie=UTF8&node=560864" 
    ] 

def parse(self, response): 
    sel = Selector(response) 
    title = sel.xpath('.//*[starts-with(@id,"result_")]/h3/a/span/text()').extract() 
    price = sel.xpath('.//*[starts-with(@id,"result_")]/ul/li[1]/div/a/span/text()').extract() 

    items = [] 
    for title,price in zip(title,price): 
     item = AmazonItem() 
     item ["title"] = title.strip() 
     item ["price"] = price.strip() 
     items.append(item) 
    return items

我的管道是：

class AmazonPipeline(object): 
    def process_item(self, item, spider): 
     return item

我的项目文件是：

import scrapy 
from scrapy.item import Item, Field 
class AmazonItem(scrapy.Item): 
    title = scrapy.Field() 
    price = scrapy.Field()

我outputing到CSV如下： scrapy抓取amazon -o output.csv -t csv

在此先感谢！

来源

2015-04-12 Predica

所以，这不会再发生看一下下面的代码，你可以采取的XPath比较，这可能帮助

def parse(self, response): 
    selector_object = response.xpath('//div[starts-with(@id,"result_")]') 
    for select in selector_object: 
     title = select.xpath('./h3/a/span/text()').extract() 
     title = title[0].strip() if title else 'N/A' 
     price = select.xpath('/ul/li[1]/div/a/span/text()').extract() 
     price = price[0].strip() if price else 'N/A' 
     item = AmazonItem(
       title=title, 
       price=price 
       ) 
     yield item

来源

2015-04-13 05:11:26 Jithin

谢谢Jithin。感谢您的意见。不幸的是，它并没有解决我没有找到缺失物品的问题。它确实激发了一种解决我的问题的替代方法。我将在下面发布我的解决方案作为单独的答案 – Predica

我伸出Jithin的方法，通过一对夫妇如果else语句这有助于解决我的问题：

def parse(self, response): 
    selector_object = response.xpath('//div[starts-with(@id,"result_")]') 
    for select in selector_object: 
     new_price=select.xpath('./ul/li[1]/a/span[1]/text()').extract() 
     title = select.xpath('./h3/a/span/text()').extract() 
     title = title[0].strip() if title else 'N/A' 
     price = select.xpath('./ul/li[1]/div/a/span/text()').extract() 
     if price: 
      price = price[0].strip() 
     elif new_price: 
      price = new_price[0].strip() 

     item = AmazonItem(
      title=title, 
      price=price 
      ) 
     yield item

来源

2015-04-13 20:13:43 Predica

Scrapy：失踪的XPath项目导致错误的数据被写入到我的管道

回答

相关问题