从上一个孩子的文本以外的节点刮取文本

我试图从Goodreads中删除引号。我只需要引用，而不是作者的名字。从上一个孩子的文本以外的节点刮取文本

以下是HTML源代码。

<div class="quoteText"> 
     “Don't cry because it's over, smile because it happened.” 
    <br> ― 
    <a class="authorOrTitle" href="/author/show/61105.Dr_Seuss">Dr. Seuss</a> 
</div>

我在下面尝试，但它带有作者信息。

quotes = [quote.text.strip() for quote in soup.findAll('div', {'class':'quoteText'})]

我也使用contents[0]尝试，但在多报价的情况下失败。请看下图：

<div class="quoteText"> 
     “You've gotta dance like there's nobody watching, 
<br> 
Love like you'll never be hurt, 
<br> 
Sing like there's nobody listening, 
<br> 
And live like it's heaven on earth.” 
    <br> ― 
    <a class="authorOrTitle" href="/author/show/1744830.William_W_Purkey">William W. Purkey</a> 
</div>

来源

2017-07-31 Chankey Pathak

这是简单的一个，当你做quote.text.strip()你会得到你可以打出字符串\n这种情况下'“Don't cry because it's over, smile because it happened.”\n ―\n Dr. Seuss'，只获得报价。例： [quote.text.strip().split("\n")[0] for quote in soup.findAll("div", {"class":"quoteText"})]

如果你不想引号（即”和“），您可以通过使用"".replace()

来源

2017-07-31 05:39:01 Gahan

哦，是取代它。奇怪它并没有跨过我的脑海。 –

从上一个孩子的文本以外的节点刮取文本

回答

相关问题