2016-12-14 102 views
0

我试图使用机械化保存usautoforce的主页。@ Ertugrul根据你的回答,我有完整的页面。但是当我试图访问用户名和密码字段时,它给出了一个错误。我已经把所有的readonly设置为false。当我在编辑器中打开的网页没有HTML指用户名和密码 这是我在机械化的代码,无法使用机械化访问完整的网页

br = mechanize.Browser() 


br.set_handle_equiv(True) 
br.set_handle_redirect(True) 
br.set_handle_robots(False) 
#br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 
br.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'), ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),('Upgrade-Insecure-Requests','1'),('Connection','keep-alive')] 

br.open("http://www.usautoforce.com/Pages/home.aspx") 
br.set_handle_robots(False) 
print br.response 
time.sleep(9) 

latest_index = 0 
html_replaced = "" 
html = br.response().read() 


for m in re.finditer('(href|src)(=")(/[^"]+")', html): 
    html_replaced += html[latest_index:m.start()] + m.groups()[0]+m.groups()[1] + 'http://www.usautoforce.com' + m.groups()[2] 
    latest_index = m.end() 


f=open("us.html","w") 
f.write(html_replaced) 
f.close() 

print [form for form in br.forms()][0] 

br.set_handle_robots(False) 
print br.response 
time.sleep(9) 
html = br.response().read() 

br.select_form(nr=0) 
time.sleep(2) 

#for control in br.form.controls: 
# print control 
    # print "type=%s, name=%s value=%s" % (control.type, control.name, br[control.name]) 

br.form.set_all_readonly(False) 
br.form["nexpartuname"] = "abc" 

br.form["pwd"] = "xyz" 
br.submit() 

以下是错误:

File "haha.py", line 60, in <module> 
    br.form["nexpartuname"] = "clack" 
    File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 2775, in __setitem__ 
    control = self.find_control(name) 
    File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 3096, in find_control 
    return self._find_control(name, type, kind, id, label, predicate, nr) 
    File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 3180, in _find_control 
    raise ControlNotFoundError("no control matching "+description) 
mechanize._form.ControlNotFoundError: no control matching name 'nexpartuname' 

回答

0

机械化不执行JavaScript的。您尝试访问的网站也在说'请启用脚本...'。

由于无法在机械化中启用js,我个人建议您使用phantomjs。

但真正的问题在这里不是JavaScript,它是网址。由于该网站上的网址是相对的,因此在下载并打开html代码时,其行为并不像预期的那样。

您必须将所有相关网址转换为绝对网址。在将html写入文件之前使用此代码。将html_replaced str而不是html str写入文件。

latest_index = 0 
html_replaced = "" 

for m in re.finditer('(href|src)(=")(/[^"]+")', html): 
    html_replaced += html[latest_index:m.start()] + m.groups()[0]+m.groups()[1] + 'http://www.usautoforce.com' + m.groups()[2] 
    latest_index = m.end() 
+0

但是当我试图在浏览器中禁用javascripts后手动打开它工作。 – user3809411

+0

@ user3809411你是对的。真正的问题是相关网址。请检查更新后的答案。 –

+0

谢谢你。它的工作现在。 – user3809411