Scrappy的HTMLXPathSelector返回null结果

我刚开始学习python/Scrapy。我能够成功地关注教程，但是我正在努力尝试自己想做的'测试'。Scrappy的HTMLXPathSelector返回null结果

我现在想要做的是继续http://jobs.walmart.com/search/finance-jobs并删除工作清单。

但是，我想我可能在XPath中做了错误的事情，但我不确定是什么。

该表没有“id”，所以我使用它的类。

from scrapy.spider import BaseSpider 
 
from scrapy.selector import HtmlXPathSelector 
 

 

 
class MySpider(BaseSpider): 
 
    name = "walmart" 
 
    allowed_domains = ["jobs.walmart.com"] 
 
    start_urls = ["http://jobs.walmart.com/search/finance-jobs"] 
 

 
    def parse(self, response): 
 
     hxs = HtmlXPathSelector(response) 
 
     titles = hxs.select("//table[@class='tableSearchResults']") 
 
     items = [] 
 
     for titles in titles: 
 
      item = walmart() 
 
      item ["title"] = titles.select("a/text()").extract() 
 
      item ["link"] = titles.select("a/@href").extract() 
 
      items.append(item) 
 
     return items

这里是页面的源代码的样子：

来源

2014-11-23 Sako

，你也说的问题，是你的XPATH。它始终是运行有用：

scrapy view http://jobs.walmart.com/search/finance-jobs

运行你的蜘蛛之前，看到该网站看起来怎么样从scrapy视图。

这应该现在的工作：

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 


class MySpider(BaseSpider): 
    name = "walmart" 
    allowed_domains = ["jobs.walmart.com"] 
    start_urls = ["http://jobs.walmart.com/search/finance-jobs"] 

    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     item = walmart() 
     titles = hxs.select("//table[@class='tableSearchResults']/tr") 
     items = [] 
     for title in titles: 
      if title.select("td[@class='td1']/a").extract(): 
       item ["title"] = title.select("td[@class='td1']/a/text()").extract() 
       item ["link"] = title.select("td[@class='td1']/a/@href").extract() 
       items.append(item) 
     return items

来源

2014-11-23 22:48:08

我要让你知道如何去这个今晚！ – Sako 2014-11-26 17:14:31

当我运行命令： scrapy crawl walmart -o items.csv -t csv 创建了一个csv类型的文件，但其中没有结果。它是一个空白文件。我错过了什么吗？ – Sako 2014-11-26 23:55:12

如果你能看到结果，试试这个（scrapy crawl walmart -o items.json）。那么你可以将它导出到csv文件 – 2014-11-27 00:06:19

Scrappy的HTMLXPathSelector返回null结果

回答

相关问题