2010-11-01 35 views
1

我正在使用红宝石机械刮一些html ...当我加载我的网页并显示必要的结果,页面lacial罚款。重载后,我做“search_results = @ agent.submit(search_form)”时,出现此错误:红宝石机械化抛出一个错误:undefined方法`<=>'

undefined method `<=>' for {emptyelem <input name="hl" value="en" type="hidden">}:Hpricot::Elem 

之前,我张贴任何代码,只是做这个戒指任何钟声?

谢谢。

代码:

start = Time.now 

    # initial set up 
    @agent = Mechanize.new 
    Mechanize.html_parser = Hpricot 
    page = @agent.get("http://www.google.com/") 
    search_form = page.forms.first 

    # conduct initial search 
    @search_term = search_form.q = params[:search].to_s 
    search_results = @agent.submit(search_form) 

    # helper variables 
    search_qs = ""; @page_number = 1; i = 0; @flag = false; 

    # get the query string structure 
    search_results.links.each { |li| search_qs = li.href if li.href.match(/.*search\?q=.*start=.*/) } 

    # search through all paginated pages 
    while (i < 500) 
     search_qs = search_qs.gsub(/start=\d+/,"start=#{i}") 
     @search_url = "http://google.com#{search_qs}" 
     search_results = @agent.get(@search_url) 
     search_results.links.each { |li| @flag = true if li.text.match("All Bout Texas Tailgating") } 
     break if @flag 
     i+=10; @page_number+=1 
    end 

@execution_time = Time.now-start 

render :layout => false 

VIEW:

<h2>Query results for "<%= @search_term %>" on Google</h2> 

<% if @flag %> 
    <p>What page is this keyword found: <b><%= @page_number %></b></p> 
    <p><%= link_to "Click to see page", "#{@search_url}", {:target => "_blank"} %></p> 
    <p>How long did this query take to run?: <%= @execution_time %> seconds</p> 
<% else %> 
    <p>Keyword not found in Google search reults</p> 
<% end %> 

堆栈跟踪:

NoMethodError (undefined method `<=>' for {emptyelem <input name="hl" value="en" type="hidden">}:Hpricot::Elem): 
    mechanize (1.0.0) lib/mechanize/form/field.rb:30:in `<=>' 
    mechanize (1.0.0) lib/mechanize/form.rb:171:in `sort' 
    mechanize (1.0.0) lib/mechanize/form.rb:171:in `build_query' 
    mechanize (1.0.0) lib/mechanize.rb:373:in `submit' 
    app/controllers/admin/importer_controller.rb:24:in `check_page_rank' 
    /opt/local/lib/ruby/1.8/webrick/httpserver.rb:104:in `service' 
    /opt/local/lib/ruby/1.8/webrick/httpserver.rb:65:in `run' 
    /opt/local/lib/ruby/1.8/webrick/server.rb:173:in `start_thread' 
    /opt/local/lib/ruby/1.8/webrick/server.rb:162:in `start' 
    /opt/local/lib/ruby/1.8/webrick/server.rb:162:in `start_thread' 
    /opt/local/lib/ruby/1.8/webrick/server.rb:95:in `start' 
    /opt/local/lib/ruby/1.8/webrick/server.rb:92:in `each' 
    /opt/local/lib/ruby/1.8/webrick/server.rb:92:in `start' 
    /opt/local/lib/ruby/1.8/webrick/server.rb:23:in `start' 
    /opt/local/lib/ruby/1.8/webrick/server.rb:82:in `start' 

Rendered rescues/_trace (98.4ms) 
Rendered rescues/_request_and_response (1.2ms) 
Rendering rescues/layout (internal_server_error) 

回答

0

所以,如果你通过在form.rb的source for mechanize - 提交表单是调用一个叫做函数build_query,它对表单上的字段进行排序。由于排序使用< =>运算符,并且它在Hpricot元素上未定义,所以您将收到异常。

似乎机械化的建立使用Nokogiri - 它可能与其他解析实现有未固定的错误。我没有深入机械化的来源,也不想责怪任何人,但你可能想尝试切换到Nokogiri这个项目(如果可能的话)。从这个片段看来,好像你依靠Hpricot。我觉得机械化在Hpricot的隐藏表单字段上抛出一个异常,但在这方面堆栈跟踪非常清晰。

你的另一个主要选择是跳到机械化源,看看你是否可以自己修复它(或者在机械化github上提出一个错误,并希望有人得到它)。

祝你好运。

+0

代码高于.... – Josh 2010-11-01 22:17:14

+0

似乎要破坏:search_results = @ agent.submit(search_form) - 但只在第二次重新加载页面时,不是第一次 – Josh 2010-11-01 22:25:13

+0

您是否有从第例外? – 2010-11-01 22:25:16