Scrapy/Python：替换空字符串

所以这里是我的Scrapy搜索器代码。我正在尝试从网站中提取元数据值。没有元数据在页面上多次出现。Scrapy/Python：替换空字符串

class MySpider(BaseSpider): 
    name = "courses" 
    start_urls = ['http://www.example.com/listing'] 
    allowed_domains = ["example.com"] 
    def parse(self, response): 
    hxs = Selector(response) 
    #for courses in response.xpath(response.body): 
    for courses in response.xpath("//meta"): 
    yield { 
       'ScoreA': courses.xpath('//meta[@name="atarbur"]/@content').extract_first(), 
       'ScoreB': courses.xpath('//meta[@name="atywater"]/@content').extract_first(), 
       'ScoreC': courses.xpath('//meta[@name="atarsater"]/@content').extract_first(), 
       'ScoreD': courses.xpath('//meta[@name="clearlywaur"]/@content').extract_first(), 
       } 
    for url in hxs.xpath('//ul[@class="scrapy"]/li/a/@href').extract(): 
     yield Request(response.urljoin(url), callback=self.parse)

所以我想实现的是，如果任何分数的值是一个空字符串（“”），我想和0（零）repalce它。我不确定如何在'yield'块中添加条件逻辑。

任何帮助非常感谢。

感谢

来源

2017-05-23 Slyper

extract_first()方法有默认值的可选参数，但在你的情况，你可以只使用or表达：

foo = response.xpath('//foo').extract_first('').strip() or 0

在这种情况下，如果extract_first()返回一个字符串，没有任何文字它将评估为“错误的”，以便评估最新的评估成员（0）。

将字符串类型转换为其他尝试：

foo = int(response.xpath('//foo').extract_first('').strip() or 0)

来源

2017-05-23 07:43:15 Granitosaurus

工作就像一个魅力。谢谢。快速提问：我上面的代码将数值作为字符串返回，即用引号括起来。你知道我怎么能不用引号返回值？ – Slyper

@Slyper是的，scrapy将总是返回'extract（）'和'extract_first（）'的字符串或字符串列表。但是，您可以将其转换为“float”或“int”类型;看我的编辑。 – Granitosaurus

太好了，再次感谢。 – Slyper

Scrapy/Python：替换空字符串

回答

相关问题