从网址中提取表格中的一行

我想从下面的链接下载所有年份（年度趋势下）的EPS值。 http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=500180&expandable=0 从网址中提取表格中的一行

我试过用下面的答案中提到的美丽汤。 Extracting table contents from html with python and BeautifulSoup 但在下面的代码后无法继续。我觉得我非常接近我的答案。任何帮助将不胜感激。

from bs4 import BeautifulSoup 
import urllib2 
html = urllib2.urlopen("http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=500180&expandable=0").read() 
soup=BeautifulSoup(html) 
table = soup.find('table',{'id' :'acr'}) 
#the below code wasn't working as I expected it to be 
tr = table.find('tr', text='EPS')

我愿意用任何其他语言来完成这件事

来源

2016-09-23 user1638998

你得到什么结果，你没有想到？ –

对象tr是空的 – user1638998

文本是在使用文本，然后调用的TD不是TR因此得到TD。家长得到TR：

In [12]: table = soup.find('table',{'id' :'acr'}) 

In [13]: tr = table.find('td', text='EPS').parent 

In [14]: print(tr) 
<tr><td class="TTRow_left" style="padding-left: 30px;">EPS</td><td class="TTRow_right">48.80</td> 
<td class="TTRow_right">42.10</td> 
<td class="TTRow_right">35.50</td> 
<td class="TTRow_right">28.50</td> 
<td class="TTRow_right">22.10</td> 
</tr> 
In [15]: [td.text for td in tr.select("td + td")] 
Out[15]: [u'48.80', u'42.10', u'35.50', u'28.50', u'22.10']

你将看到完全匹配页面上的内容。

另一种方法是调用find_next_siblings：

In [17]: tds = table.find('td', text='EPS').find_next_siblings("td") 

In [18]: tds 
Out[19]: 
[<td class="TTRow_right">48.80</td>, 
<td class="TTRow_right">42.10</td>, 
<td class="TTRow_right">35.50</td>, 
<td class="TTRow_right">28.50</td>, 
<td class="TTRow_right">22.10</td>] 
In [20]: [td.text for td in tds] 
Out[20]: [u'48.80', u'42.10', u'35.50', u'28.50', u'22.10']

来源

2016-09-23 18:17:56

从网址中提取表格中的一行

回答

相关问题