如何摆脱文本上方的空白，使用bs4

-1

好的，所以我使用bs4（BeautifulSoup）解析通过网站，并找到我正在寻找的具体标题。我的代码如下所示：如何摆脱文本上方的空白，使用bs4

import requests 
from bs4 import BeautifulSoup 
url = 'http://www.ewn.co.za/Categories/Local' 
r = requests.get(url).text 
soup = BeautifulSoup(r) 
for i in soup.find_all(class_='article-short'): 
    if i.a: 
     print(i.a.text.replace('\n', '').strip()) 
    else: 
     print(i.contents[0].strip())

此代码的工作，但在其输出节目，如20线空白的第一，从网站上打印申请标题前。我的代码有什么问题，或者有什么我可以做的，以摆脱空白？

来源

2016-05-14 raid3r

随着带的功能，你可以在一个字符串中删除空格（https://docs.python.org/3/library/stdtypes.html#str.strip） – Querenker

因为你有这样的内容：

<article class="article-short"> 
<div class="thumb"><a href="http://ewn.co.za/2016/05/14/Contralesa-against-scrapping-initiation-due-to-cold-weather"><img alt="FILE: Boys who have undergone a circumcision ceremony walk near Qunu in the Eastern Cape in 2013. Picture: AFP." height="147" src="http://ewn.co.za/cdn/-%2fmedia%2f3C37CB28056746CD95FC913757AAD41C.ashx%3fas%3d1%26h%3d147%26w%3d234%26crop%3d1;waeb9b8157b3e310df" width="234"/></a></div> 
<h6 class="h6-mega"><a href="http://ewn.co.za/2016/05/14/Contralesa-against-scrapping-initiation-due-to-cold-weather">Contralesa against scrapping initiation due to cold weather</a></h6> 
</article>

其中第一个链接包含图像，并没有文字。

您应该寻找代替h6标记。所以，像这样的工作：

import requests 
from bs4 import BeautifulSoup 
url = 'http://www.ewn.co.za/Categories/Local' 
r = requests.get(url).text 
soup = BeautifulSoup(r) 
for i in soup.find_all(class_='article-short'): 
    title = (i.h6.text.replace('\n', '') if i.h6 else contents[0]).strip() 
    if title: 
     print(title)

来源

2016-05-14 13:44:05 aldanor

谢谢！ @aldanor现在效果更好！ – raid3r

如何摆脱文本上方的空白，使用bs4

回答

相关问题