使用python查找html文档中输入字段的值

我想从HTML文档获取输入值，并且想要解析出隐藏输入字段的值。例如，我怎么才能解析出只有下面的代码段中的值，使用python。使用python查找html文档中输入字段的值

<input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" /> 
    <input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" />

而Python函数的输出应该返回类似：

post_form_id : d619a1eb3becdc05a3ebea530396782f 
fb_dtsg : AQCYsohu

来源

2011-09-19 Vlad

检查这个美丽的答案在这里-http：//stackoverflow.com/a/11205758/609782 – Darpan

你可以使用BeautifulSoup：

>>> htmlstr = """ <input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" /> 
...  <input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" />""" 
>>> from BeautifulSoup import BeautifulSoup 
>>> soup = BeautifulSoup(htmlstr) 
>>> [(n['name'], n['value']) for n in soup.findAll('input')] 
[(u'post_form_id', u'd619a1eb3becdc05a3ebea530396782f'), (u'fb_dtsg', u'AQCYsohu')]

来源

2011-09-19 16:34:48 jterrace

感谢建议BeautifulSoup，这是比我一直在寻找更好的。 – Vlad

或者与lxml：

import lxml.html 

htmlstr = ''' 
    <input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" /> 
    <input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" /> 
''' 

// Parse the string and turn it into a tree of elements 
htmltree = lxml.html.fromstring(htmlstr) 

// Iterate over each input element in the tree and print the relevant attributes 
for input_el in htmltree.xpath('//input'): 
    name = input_el.attrib['name'] 
    value = input_el.attrib['value'] 

    print "%s : %s" % (name, value)

给出：

 
post_form_id : d619a1eb3becdc05a3ebea530396782f 
fb_dtsg : AQCYsohu

来源

2011-09-19 17:16:28 Acorn

使用python查找html文档中输入字段的值

回答

相关问题