2011-08-21 66 views
1

我想在网页上提交一个类似于这样的登录表单。我也尝试提交嵌套表单以及提交两个表单,每次都出现相同的错误。用python机械化提交嵌套表格

<form method="post" name="loginform"> 
    <input type='hidden' name='login' value='1'> 
    <form action="#" method="post" id="login"> 
      Username 
      <input type="text" name="username" id="username" /> 
      Password 
      <input type="password" name="password" id="password" /> 
      <input type="submit" value='Login' class="submit" /> 

这是我使用的python脚本。我还注意到,这些表格没有关闭</form>我不确定这是否与我的问题有关。

from mechanize import Browser 

br = Browser() 

br.set_handle_robots(False) 
br.addheaders = [('User-agent', 'Firefox')] 

br.open('http://www.example.com/') 

br.select_form(name="loginform") 

br['login'] = '1' 
br['username'] = 'user' 
br['password'] = 'pass' 

resp = br.submit() 

我得到的错误是

ParseError: nested FORMs 

编辑:

import mechanize 
from BeautifulSoup import MinimalSoup 

class PrettifyHandler(mechanize.BaseHandler): 
    def http_response(self, request, response): 
     if not hasattr(response, "seek"): 
      response = mechanize.response_seek_wrapper(response) 
     # only use BeautifulSoup if response is html 
     if response.info().dict.has_key('content-type') and ('html' in response.info().dict['content-type']): 
      soup = MinimalSoup (response.get_data()) 
      response.set_data(soup.prettify()) 
     return response 

br = mechanize.Browser() 
br.add_handler(PrettifyHandler()) 

br.open('http://example.com/') 

br.select_form(nr=1) 
br.form['username'] = 'mrsmith' 
br.form['password'] = '123abc' 
resp = br.submit() 

print resp.read() 

回答

1

你CA ñ尝试寻找页面的违规部分并手动调整。例如,我有一个页面出现嵌套表单问题,我发现有

<FORM></FORM> 

坐在另一个表单块内。我还需要删除第一行,因为它的格式不好。所以你可以尝试这样的事情:

... 
resp = br.open(url) # Load login page 
# the [111:0] takes away the first 111 chars of the response 
# the .replace('<FORM></FORM>','') removes the bad HTML 
resp.set_data(resp.get_data()[111:].replace('<FORM></FORM>','')) 
br.set_response(resp)