2013-03-19 103 views
-4

我有以下HTML:阅读元素 -​​

<tr style='background:#DDDDDD;'> 
    <td><b>ASD</b></td> 
    <td colspan='3'>1231</td> 
</tr> 

此元素不重复的页面上,所以它是独一无二的。我想把单元格的内容(1231)变成一些变量。我尝试使用HTML.parser,但它不工作

+2

你能告诉我们你试过吗? – 2013-03-19 20:03:15

回答

0

看使用beautifulsoup这是伟大的,

from BeautifulSoup import BeautifulSoup 

soup = BeautifulSoup(html) ## feed your html page to beautifulsoup 

pleaseFind = soup.find(text="ASD") 

whatINeed = pleaseFind.findNext('td') 

print whatINeed.text 
+0

汤是这样做的:-)谢谢 – user2188158 2013-03-19 20:21:10

+0

没问题,你可以用它来找到锚点,段落,标题或任何你需要的东西。 – 2013-03-19 20:22:04

0

你可以使用的urllib2(你没有安装任何新的(至少在Windows版本的蟒蛇)):http://docs.python.org/2/howto/urllib2.html

例子:

import urllib2 
response = urllib2.urlopen('your URL') 
html = response.read() 
#html is a string containing everything on your page 

#this line (it could be a bit cleaner) finds substring "<td colspan='3'>" and 
#searches between it's position and the next "</td>" 
pos=html.find("<td colspan='3'>") 
print html[pos+len("<td colspan='3'>")+1:html.find("</td>", pos))]