我试图刮下面的HTML代码的标题:是否有scrapy跟随同胞计数?
<FONT COLOR=#5FA505><B>Claim:</B></FONT> Coed makes unintentionally risqué remark about professor's "little quizzies."
<BR><BR>
<CENTER><IMG SRC="/images/content-divider.gif"></CENTER>
我使用这个代码:
def parse_article(self, response):
for href in response.xpath('//font[b = "Claim:"]/following-sibling::text()'):
print href.extract()
,我成功地拉了正确的Claim:
值,我从想前面提到过的html代码,但是也有(在同一页面中具有类似结构的其他代码)拉下面的html。我正在定义我的xpath()
只需拉入名为Claim:
的font
标记,那么为什么它也拉动下面的Origins
?我该如何解决它?我想看到的,如果我能得到的只是下一个following-sibling
,而不是所有的人,但没有奏效
<FONT COLOR=#5FA505 FACE=""><B>Origins:</B></FONT> Print references to the "little quizzies" tale date to 1962, but the tale itself has been around since the early 1950s. It continues to surface among college students to this day. Similar to a number of other college legends
'.extract()[0]' –
@JohnDene我的输出变化,但它只是一堆空的空间,偶尔会出现','每隔一段时间 – Rafa
我认为这是您正在使用for循环的bcoz。如果我知道它是正确的,你只想提取一个值? –