2017-11-18 193 views
0

我正在使用CSVFeedSpider来抓取本地csv文件(foods.csv)。使用CSVFeedSpider时出现错误

这就是:

calories name       price 

650   Belgian Waffles    $5.95 

900   Strawberry Belgian Waffles $7.95 

900   Berry-Berry Belgian Waffles $8.95 

600   French Toast     $4.50 

950   Homestyle Breakfast   $6.95 

这里是我的foods.py文件代码:

from scrapy.spiders import CSVFeedSpider 
from foods_csv.items import FoodsCsvItem 

class FoodsSpider(CSVFeedSpider): 
    name = 'foods' 
    start_urls = ['file:///users/Mina/Desktop/foods.csv'] 
    delimiter = ';' 
    quotechar = "'" 
    headers = ['name', 'price', 'calories'] 

    def parse_row(self, response, row): 
     self.logger.info('Hi, this is a row!: %r', row) 
     item = FoodsCsvItem() 
     item['name'] = row['name'] 
     item['price'] = row['price'] 
     item['calories'] = row['calories'] 
     return item 

items.py

import scrapy 

class FoodsCsvItem(scrapy.Item): 
    name = scrapy.Field() 
    price = scrapy.Field() 
    calories = scrapy.Field() 

但它给我这个错误:

2017-11-18 13:04:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET file:///users/Mina/Desktop/foods.csv> (referer: None) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 1 (length: 1, should be: 3) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 2 (length: 1, should be: 3) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 3 (length: 1, should be: 3) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 4 (length: 1, should be: 3) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 5 (length: 1, should be: 3) 
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 6 (length: 1, should be: 3) 

在开始的时候我只是刮“名”和“价格”,但它给了我同样的错误,所以我尝试添加“卡路里”根据这个方案,Scrapy: Scraping CSV File - not getting any output但什么都没有改变!

我只需要刮'名称'和'价格'我该怎么做?

回答

1

看起来可能是您的CSV文件的具体格式发布时,它迷路了。如果格式与此处的发布完全相同,那么它实际上看起来像TSV(制表符分隔值)文件,您可以尝试将delimiter = ';'更改为delimiter = '\t'

但是,既然您已指定'作为引号字符,我认为这是正确的?我会尝试在CSV文件上运行搜索/替换,并用"替换',看看是否有帮助。在使用单引号之前,我有一些奇怪的问题。

-1

试试这个

def parse_row(self, response, row): 
     self.logger.info('Hi, this is a row!: %r', row) 
     item = FoodsCsvItem() 
     item['name'] = row['name'] 
     item['price'] = row['price'] 
     item['calories'] = row['calories'] 
     return item 
+0

好的。我编辑它,但它给了我同样的错误。 – MAGS94