如何解析与scrapy

我不断收到错误的多个页面：invaled语法如何解析与scrapy

1.add_xpath('tagLine', '//p[@class="tagline"]/text()')

，我似乎无法找出原因，它给我的错误，因为据我可以告诉它是与所有其他1.add_xpath（）方法相同的语法。我的另一个问题是我如何请求其他页面。基本上我正在浏览一个大页面，并通过页面上的每个链接，然后一旦完成了页面，我希望它转到下一个大页面的下一个（按钮），但我不知道怎么做。

def parse(self, response): 
    hxs = HtmlXPathSelector(response) 
    for url in hxs.select('//a[@class="title"]/@href').extract(): 
     yield Request(url, callback=self.description_page) 
    for url_2 in hxs.select('//a[@class="POINTER"]/@href').extract(): 
     yield Request(url_2, callback=self.description_page) 

def description_page(self, response): 
    l = XPathItemLoader(item=TvspiderItem(), response=response) 
    l.add_xpath('title', '//div[@class="m show_head"]/h1/text()') 
    1.add_xpath('tagLine', '//p[@class="tagline"]/text()') 
    1.add_xpath('description', '//div[@class="description"]/span') 
    1.add_xpath('rating', '//div[@class="score"]/text()') 
    1.add_xpath('imageSrc', '//div[@class="image_bg"]/img/@src') 
    return l.load_item()

任何帮助，将不胜感激。当谈到python和scrapy时，我仍然是一个noob。

来源

2012-01-20 AlexW.H.B.

def description_page(self, response): 
    l = XPathItemLoader(item=TvspiderItem(), response=response) 
    l.add_xpath('title', '//div[@class="m show_head"]/h1/text()') 
    1.add_xpath('tagLine', '//p[@class="tagline"]/text()') 
    1.add_xpath('description', '//div[@class="description"]/span') 
    1.add_xpath('rating', '//div[@class="score"]/text()') 
    1.add_xpath('imageSrc', '//div[@class="image_bg"]/img/@src') 
    return l.load_item()

你有位1，而不是变量名l的。

来源

2012-01-20 09:30:08 warvariuc

你是男人之神。非常感谢你。你是怎么看到这个的？这很疯狂。 :)你也有任何想法，为什么它只解析一个页面，而不是每个页面？ –

我想我抓到了，因为我用一组不同的字体。在我的字体中，1和l之间的差异相当大。 :) – warvariuc

这真棒！我花了20分钟的时间看着代码的和平，试图找出不同之处。我想这对我来说是一个教训，不要使用这种简单的错误变量名称。但再次感谢你。 :) –

如何解析与scrapy

回答

相关问题