在Google Docs中使用机械化

我试图使用Mechanize登录到Google Docs，以便可以抓取某些东西（不可能从API中获得），但在尝试遵循元重定向时，我似乎总是收到404：在Google Docs中使用机械化

require 'rubygems' 
require 'mechanize' 

USERNAME = "..." 
PASSWORD = "..." 

LOGIN_URL = "https://www.google.com/accounts/Login?hl=en&continue=http://docs.google.com/" 

agent = Mechanize.new 
login_page = agent.get(LOGIN_URL) 
login_form = login_page.forms.first 
login_form.Email = USERNAME 
login_form.Passwd = PASSWORD 
login_response_page = agent.submit(login_form) 

redirect = login_response_page.meta[0].uri.to_s 

puts "redirect: #{redirect}" 

followed_page = agent.get(redirect) # throws a HTTPNotFound exception 

pp followed_page

任何人都可以看到为什么这不工作？

来源

2010-06-08 Andy Waite

安迪你真棒！您的代码帮助我使脚本正常工作并登录到Google帐户。几个小时后我发现你的错误。它是关于html转义的。正如我发现的，机械化会自动将uri作为“get”方法的参数转义。所以我的解决方案是：

EMAIL = ".." 
PASSWD = ".." 
agent = Mechanize.new{ |a| a.log = Logger.new("mech.log")} 
agent.user_agent_alias = 'Linux Mozilla' 
agent.open_timeout = 3 
agent.read_timeout = 4 
agent.keep_alive = true 
agent.redirect_ok = true 
LOGIN_URL = "https://www.google.com/accounts/Login?hl=en" 

login_page = agent.get(LOGIN_URL) 
login_form = login_page.forms.first 
login_form.Email = EMAIL 
login_form.Passwd = PASSWD 
login_response_page = agent.submit(login_form) 

redirect = login_response_page.meta[0].uri.to_s 

puts redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/" 
followed_page = agent.get(redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/adplanner") 
pp followed_page

这对我来说工作得很好。我已经用meta标签（已经转义）替换了继续参数。

来源

2011-04-02 13:12:28

在Google Docs中使用机械化

回答

相关问题