我有一个表格并希望使用Nokogiri来提取每个表格行中前两个单元格的内容。目前我面临一些困难,希望得到你的帮助。这是我现在得到的。任何人都可以帮助我?谢谢。使用Nokogiri提取所有表格行中的前2个表格单元格
irb(main):001:0> require 'nokogiri'
=> true
irb(main):002:0>
irb(main):003:0* @doc = Nokogiri::HTML::DocumentFragment.parse <<-EOHTML
irb(main):004:0" <body>
irb(main):005:0" <div class="c">
irb(main):006:0" <table>
irb(main):007:0" <tr>
irb(main):008:0" <td>test</td><td>test</td><td>test</td><td>test</td>
irb(main):009:0" </tr>
irb(main):010:0" <tr class="even">
irb(main):011:0" <td>test</td><td>test</td><td>test</td><td>test</td>
irb(main):012:0" </tr>
irb(main):013:0" <tr>
irb(main):014:0" <td>test</td><td>test</td><td>test</td><td>test</td>
irb(main):015:0" </tr>
irb(main):016:0" <tr class="even">
irb(main):017:0" <td>test</td><td>test</td><td>test</td><td>test</td>
irb(main):018:0" </tr>
irb(main):019:0" </table>
irb(main):020:0" </div>
irb(main):021:0" </body>
irb(main):022:0" EOHTML
irb(main):026:0> @doc.css("div.c > table").search("table/tr/td")
=> ...
irb(main):026:0> @doc.css("div.c > table").search("table/tr/td[position()>2]")
Nokogiri::CSS::SyntaxError: unexpected '>' after '#<Nokogiri::CSS::Node:0x2b7bc20>'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/css/parser_extras.rb:87:in `on_error'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/1.9.1/racc/parser.rb:99:in `_racc_do_parse_c'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/1.9.1/racc/parser.rb:99:in `do_parse'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/css/parser_extras.rb:62:in `parse'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/css/parser_extras.rb:79:in `xpath_for'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/css.rb:23:in `xpath_for'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/xml/node_set.rb:111:in `block (2 levels) in
css'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/xml/node_set.rb:109:in `map'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/xml/node_set.rb:109:in `block in css'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/xml/node_set.rb:239:in `block in each'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/xml/node_set.rb:238:in `upto'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/xml/node_set.rb:238:in `each'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/xml/node_set.rb:105:in `css'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/xml/node_set.rb:83:in `block in search'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/xml/node_set.rb:80:in `each'
from C:/RailsInstaller/Ruby1.9.2/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0-x86-mingw32/lib/nokogiri/xml/node_set.rb:80:in `search'
from (irb):27
from C:/RailsInstaller/Ruby1.9.2/bin/irb:12:in `<main>'irb(main):028:0>
嗨Vivien,你的方法将以相同的方式处理所有匹配的表格单元格。实际上,我例子中每行的两个单元格都有一些关系,我需要它们的值。有什么方法可以提取它们并保留它们的关系?例如,如何获得每行中的前2个单元并将它们连接起来?谢谢。 – 2012-02-13 15:00:52
@Yousui获取第一个'td',并在迭代它们时使用['td.next_element'](http://nokogiri.org/Nokogiri/XML/Node.html#method-i-next_element)来查找第二个'td'在那一行。 – Phrogz 2012-02-13 23:23:18