如何从scrapy数据插入到mysql

我试图让使用scrapy我可以得到CSV数据从Amazon的数据，但我不能够插入数据在mysql数据库请找我的代码 我的蜘蛛如何从scrapy数据插入到mysql

import scrapy 
from craigslist_sample.items import AmazonDepartmentItem 
from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.contrib.linkextractors import LinkExtractor 

class AmazonAllDepartmentSpider(scrapy.Spider): 

    name = "amazon" 
    allowed_domains = ["amazon.com"] 
    start_urls = [ 
     "http://www.amazon.com/gp/site-directory/ref=nav_sad/187-3757581-3331414" 
    ] 
    def parse(self, response): 
     for sel in response.xpath('//ul/li'): 
      item = AmazonDepartmentItem() 
      item['title'] = sel.xpath('a/text()').extract() 
      item['link'] = sel.xpath('a/@href').extract() 
      item['desc'] = sel.xpath('text()').extract() 
     return item

我的流水线代码是

import sys 
import MySQLdb 
import hashlib 
from scrapy.exceptions import DropItem 
from scrapy.http import Request 

class MySQLStorePipeline(object): 


    host = 'rerhr.com' 
    user = 'amazon' 
    password = 'sads23' 
    db = 'amazon_project' 

    def __init__(self): 
     self.connection = MySQLdb.connect(self.host, self.user, self.password, self.db) 
     self.cursor = self.connection.cursor() 

    def process_item(self, item, spider): 
     try: 
      self.cursor.execute("""INSERT INTO amazon_project.ProductDepartment (ProductDepartmentLilnk) 
          VALUES (%s)""", 
          (
          item['link'].encode('utf-8'))) 

      self.connection.commit() 

     except MySQLdb.Error, e: 
      print "Error %d: %s" % (e.args[0], e.args[1]) 
     return item

当我运行以下命令

scrapy爬行亚马逊-o items.csv -t CSV

那么我可以能够得到的数据在我的CSV 但是当我运行

scrapy爬行亚马逊

与上面的代码我不是能够在MySQL enter image description here 插入数据，请帮助我什么，我们必须做的话，我可以在MySQL中插入数据

感谢

来源

2014-12-05 wiretext

什么是控制台？任何错误？管道是否在设置中打开？你确定你正在检查结果到你插入的同一个数据库吗？谢谢。 – alecxe 2014-12-05 15:02:09

我的管道设置是ITEM_PIPELINES = ['projectname.pipelines.MySQLStorePipeline']，是的，我正在检查相同的数据库 – wiretext 2014-12-05 15:19:24

问题实际上是在parse()回调中。 extract()调用返回列表，因此所有项目字段值都将成为列表。然后，item['link'].encode('utf-8')调用管道失败，因为列表上没有encode()方法。

一个快速和简单的解决将是获得extract()调用结果的第一要素：

def parse(self, response): 
    for sel in response.xpath('//ul/li'): 
     item = AmazonDepartmentItem() 
     item['title'] = sel.xpath('a/text()').extract()[0] 
     item['link'] = sel.xpath('a/@href').extract()[0] 
     item['desc'] = sel.xpath('text()').extract()[0] 
     yield item

请注意，我也有一个yield item取代return item表达，并把它的循环中。

一个更好的方法是定义与输入和输出处理器的ItemLoader：

from scrapy.contrib.loader import ItemLoader 
from scrapy.contrib.loader.processor import TakeFirst 

class ProductLoader(ItemLoader): 
    default_output_processor = TakeFirst()

仅供参考，这里是TakeFirst()做：

返回第一个非空值/非空值，，所以它通常用作单值字段的输出处理器。它没有收到任何构造函数参数，也不接受Loader 上下文。

然后，parse()方法将转变为：

def parse(self, response): 
    for sel in response.xpath('//ul/li'): 
     l = ItemLoader(item=AmazonDepartmentItem(), selector=sel) 
     l.add_xpath('title', 'a/text()') 
     l.add_xpath('link', 'a/@href') 
     l.add_xpath('desc', 'text()') 
     yield l.load_item()

来源

2014-12-05 15:22:29 alecxe

你是天才感谢伙计:)你救了我充裕的时间 – wiretext 2014-12-05 16:14:30

如何从scrapy数据插入到mysql

回答

相关问题