2016-09-23 127 views
0

解析一个html表格后,我能够获得表格的第一行作为Nokogiri元素。访问Nokogiri元素子元素

2.2.1 :041 > pp content[1]; nil 
#(Element:0x3feee917d1e0 { 
    name = "tr", 
    children = [ 
    #(Element:0x3feee917cfd8 { 
     name = "td", 
     attributes = [ 
     #(Attr:0x3feee917cf74 { name = "valign", value = "top" })], 
     children = [ 
     #(Element:0x3feee917ca60 { 
      name = "a", 
      attributes = [ 
      #(Attr:0x3feee917c9fc { 
       name = "href", 
       value = "/cgi-bin/own-disp?action=getowner&CIK=0001513362" 
       })], 
      children = [ #(Text "Maestri Luca")] 
      })] 
     }), 
    #(Text "\n"), 
    #(Element:0x3feee917c150 { 
     name = "td", 
     children = [ 
     #(Element:0x3feee917d794 { 
      name = "a", 
      attributes = [ 
      #(Attr:0x3feee9179fb8 { 
       name = "href", 
       value = "/cgi-bin/browse-edgar?action=getcompany&CIK=0001513362" 
       })], 
      children = [ #(Text "0001513362")] 
      })] 
     }), 
    #(Text "\n"), 
    #(Element:0x3feee91796a8 { 
     name = "td", 
     children = [ #(Text "2016-09-04")] 
     }), 
    #(Text "\n"), 
    #(Element:0x3feee9179194 { 
     name = "td", 
     children = [ #(Text "officer: Senior Vice President, CFO")] 
     }), 
    #(Text "\n")] 
    }) 
=> nil 

这是该行的内容:

马斯特里卢卡0001513362 2016年9月4日官:高级副总裁,CFO

我需要访问的姓名,号码,日期和Nokogiri元素的标题。这样做的

一种方法是如下:

2.2.1 :042 > pp content[1].text; nil 
"Maestri Luca\n0001513362\n2016-09-04\nofficer: Senior Vice President, CFO\n" 

不过,我正在寻找单独访问的元素,而不是作为一个长刺用换行符的一种方式。我该怎么做?

回答

1
name, number, date, title = *content[1].css('td').map(&:text) 

如果content[1]trcontent[1].css('td')会发现所有td元素在它下面,.map(&:text)会调用td.text为每个td,并把它变成一个数组,我们比*图示,所以我们可以做多重分配。

(注意:下次请包含原始HTML片段,不包括Nokogiri节点检查结果。)