2017-07-31 71 views
0

我刮雅虎财经网站获取公司股票数据,我用美丽的汤提取td标签,但我想删除span标签,无法做到这一点。以下是我需要提取文本的html代码的几行代码。如何从td美丽的汤中删除跨度Python 3.5

[ < td class = "Py(10px) Ta(start)" 
data - reactid = "53" > < span data - reactid = "54" > 31 - Jul - 2017 < /span></td > , < td class = "Py(10px)" 
data - reactid = "55" > < span data - reactid = "56" > 991.90 < /span></td > , < td class = "Py(10px)" 
data - reactid = "57" > < span data - reactid = "58" > 1, 021.70 < /span></td > , < td class = "Py(10px)" 
data - reactid = "59" > < span data - reactid = "60" > 986.75 < /span></td > , < td class = "Py(10px)" 
data - reactid = "61" > < span data - reactid = "62" > 1, 011.20 < /span></td > 

]

我下面的代码给了我上面的内容。

INFY = url.urlopen("https://in.finance.yahoo.com/quote/INFY.NS/history?p=INFY.NS") 
INFYHis = INFY.read() 
INFYSoup = soup(INFYHis,'html.parser') 
INFYtd=INFYSoup.findAll("td",{"class":"Py(10px)"}) 

我对python非常陌生,不确定如何获取删除或获取我的分析文本。

+0

那么你想删除它或获取文本? –

+0

是的,我需要得到的文本,并以数据框的形式,以便我可以使用它作为熊猫datafrome –

回答

1

您可以使用BeautifulSoup的unwrap()方法。

INFYSoup = soup(INFYHis,'html.parser') 

for match in INFYSoup.find_all('span'): # add these two extra two lines 
    match.unwrap()      # to filter the `<span>` tag content first 

# then proceed as usual 
INFYtd=INFYSoup.findAll("td",{"class":"Py(10px)"}) 

for child in INFYtd: 
    print child 

演示:

<td class="Py(10px) Ta(start)" data-reactid="53">31-Jul-2017</td> 
<td class="Py(10px)" data-reactid="55">991.90</td> 
... 
... 
<td class="Py(10px)" data-reactid="1540">992.59</td> 
<td class="Py(10px)" data-reactid="1542">30,89,588</td> 

实现了基于重复的答案链接

提取Py(10px)上课前只需添加这两种额外的两行从INFYSoup内容<span>标签内容过滤器在评论中(Removing span tags from soup BeautifulSoup/Python)。

+0

谢谢我试过你的代码,并删除,但我得到了另一个代码,我用它来工作。 –

+0

@KeertheshKumar,很高兴听到你让它工作!做得好! – davedwards