2011-09-19 104 views
4

我想从HTML文档获取输入值,并且想要解析出隐藏输入字段的值。例如,我怎么才能解析出只有下面的代码段中的值,使用python。使用python查找html文档中输入字段的值

<input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" /> 
    <input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" /> 

而Python函数的输出应该返回类似:

post_form_id : d619a1eb3becdc05a3ebea530396782f 
fb_dtsg : AQCYsohu 
+0

检查这个美丽的答案在这里-http://stackoverflow.com/a/11205758/609782 – Darpan

回答

6

你可以使用BeautifulSoup

>>> htmlstr = """ <input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" /> 
...  <input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" />""" 
>>> from BeautifulSoup import BeautifulSoup 
>>> soup = BeautifulSoup(htmlstr) 
>>> [(n['name'], n['value']) for n in soup.findAll('input')] 
[(u'post_form_id', u'd619a1eb3becdc05a3ebea530396782f'), (u'fb_dtsg', u'AQCYsohu')] 
+0

感谢建议BeautifulSoup,这是比我一直在寻找更好的。 – Vlad

3

或者与lxml

import lxml.html 

htmlstr = ''' 
    <input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" /> 
    <input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" /> 
''' 

// Parse the string and turn it into a tree of elements 
htmltree = lxml.html.fromstring(htmlstr) 

// Iterate over each input element in the tree and print the relevant attributes 
for input_el in htmltree.xpath('//input'): 
    name = input_el.attrib['name'] 
    value = input_el.attrib['value'] 

    print "%s : %s" % (name, value) 

给出:

 
post_form_id : d619a1eb3becdc05a3ebea530396782f 
fb_dtsg : AQCYsohu