如何使用Rails和Nokogiri找到直接的孩子，而不是嵌套的孩子？

我使用Rails 4.2.7与Ruby（2.3）和Nokogiri。我如何找到桌子上最直接的tr儿童，而不是嵌套儿童？目前，我发现表格中的表格像这样...如何使用Rails和Nokogiri找到直接的孩子，而不是嵌套的孩子？

tables = doc.css('table') 
    tables.each do |table| 
    rows = table.css('tr')

这不仅发现表的直接行，

<table> 
    <tbody> 
     <tr>…</tr>

但它还发现行内的行，例如，

<table> 
    <tbody> 
     <tr> 
      <td> 
       <table> 
        <tr>This is found</tr> 
       </table> 
      </td> 
     </tr>

如何优化我的搜索以仅查找直接tr元素？

来源

2016-11-30 Dave

Nokogiri实现了CSS，其中包括一些jQuery扩展，所以熟悉样式表选择器的工作方式，并且应该有更好的运气。 CSS更具可读性，但XPath更强大，因此了解这两方面都很好。在生成的HTML中很少使用'tbody'标记，但是当您查看页面HTML时，浏览器往往会将它们粘在一起。不要相信浏览器，而是直接在命令行中使用'wget'或'curl'或'nokogiri'查看HTML。如果原始HTML包含它，只能使用'tbody'。 –

@Dave：只是好奇：为什么你会接受一个答案，而不是upvote呢？ –

您可以使用XPath执行几个步骤。首先，你需要找到的table的“水平”（即如何嵌套它是在其他表），然后找到所有后代tr有相同数量的table祖先：

tables = doc.xpath('//table') 
tables.each do |table| 
    level = table.xpath('count(ancestor-or-self::table)') 
    rows = table.xpath(".//tr[count(ancestor::table) = #{level}]") 
    # do what you want with rows... 
end

在更一般的情况下，在这里你可能tr嵌套直接其它tr S，你可以做这样的事情（这将是无效的HTML，但你可能有XML或其他一些标签）：

tables.each do |table| 
    # Find the first descendant tr, and determine its level. This 
    # will be a "top-level" tr for this table. "level" here means how 
    # many tr elements (including itself) are between it and the 
    # document root. 
    level = table.xpath("count(descendant::tr[1]/ancestor-or-self::tr)") 
    # Now find all descendant trs that have that same level. Since 
    # the table itself is at a fixed level, this means all these nodes 
    # will be "top-level" rows for this table. 
    rows = table.xpath(".//tr[count(ancestor-or-self::tr) = #{level}]") 
    # handle rows... 
end

第一步可以分为两个单独的查询，可能更清楚：

first_tr = table.at_xpath(".//tr") 
level = first_tr.xpath("count(ancestor-or-self::tr)")

（如果有表无tr小号虽然，这将失败，因为first_tr将nil。上面的组合XPath可以正确处理这种情况。）

来源

2016-11-30 18:37:33 matt

有趣的是，如何使用xpath计数来完成它，但它仅适用于'tag_a/tag_b/tag_a/tag_b'结构（例如'/ table/tr/table/tr'），而不适用于'tag_a/tag_b/tag_b'。 OP说他想要一个普遍的答案。 –

@EricDuminil你可以扩展这种技术来处理像'tag_a/tag_b/tag_b'这样的情况。这有点复杂，但并不多。 – matt

感谢您的回答。你介意显示相应的解决方案吗？我真的不太了解xpath，并想学习。 –

我不知道它是否可以直接用css/xpath完成，所以我写了一个递归查找节点的小方法。它一找到就停止递归。

xml= %q{ 
<root> 
    <table> 
    <tbody> 
     <tr nested="false"> 
     <td> 
      <table> 
      <tr nested="true"> 
       This is found</tr> 
      </table> 
     </td> 
     </tr> 
    </tbody> 
    </table> 
    <another_table> 
    <tr nested = "false"> 
     <tr nested = "true"> 
    </tr> 
    </another_table> 
    <tr nested = "false"/> 
</root> 
} 

require 'nokogiri' 

doc = Nokogiri::XML.parse(xml) 

class Nokogiri::XML::Node 
    def first_children_found(desired_node) 
    if name == desired_node 
     [self] 
    else 
     element_children.map{|child| 
     child.first_children_found(desired_node) 
     }.flatten 
    end 
    end 
end 

doc.first_children_found('tr').each do |tr| 
    puts tr["nested"] 
end 

#=> 
# false 
# false 
# false

来源

2016-11-30 17:26:01

它没有。如果HTML是一个表后跟一个tr（中间没有任何人），它就会失败。 – Dave

你的例子中不清楚。你有'table/tbody/tr'和'table/tr'吗？ –

我的问题比这个更多。我如何选择不嵌套在其他trs中的表格？ – Dave

如何使用Rails和Nokogiri找到直接的孩子，而不是嵌套的孩子？

回答

相关问题