2012-01-12 43 views
3

我试图循环遍历每个元素,但是遇到了下面内部循环的问题。在我看来,xpath模式'*/td'没有返回任何结果。我期望看到打印到标准输出的标签内的数据。我正在使用nokogiri。nokogiri和xpath - 使用数据集嵌套循环

我粘贴到这一点我的rails控制台:

require 'nokogiri' 
f = File.open("public/index.html") 
doc = Nokogiri::HTML(f) 
f.close 

doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row| 
    puts "row= " + row.to_s 
    row.xpath('*/td').each do |td| 
    puts "td= " + td 
    end 
end 

,这里是从控制台输出:

row= <tr id="208894"> 
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td> 
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td> 
<td headers="WhoIsOnDutyTableLevel1:header:3">0</td> 
</tr> 
row= <tr id="207792"> 
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td> 
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td> 
<td headers="WhoIsOnDutyTableLevel1:header:3">5</td> 
</tr> 
=> 0 

以下是我正在解析HTML:

<table class="duty-report-level1" id="WhoIsOnDutyTableLevel1"> 
<caption></caption> 
<thead> 

<tr> 
<th id="WhoIsOnDutyTableLevel1:header:1" class="duty-report-lt-header">c</th> 
</tr> 
</thead> 
<tfoot></tfoot> 
<tbody> 
<tr> 
<td headers="WhoIsOnDutyTableLevel1:header:1"> 
<table class="duty-report-level2" id="WhoIsOnDutyTableLevel2"> 
<caption></caption> 
<thead> 
<tr> 
<th id="WhoIsOnDutyTableLevel1:header:1">Group Name</th><th id="WhoIsOnDutyTableLevel1:header:2">Group Time Zone</th><th id="WhoIsOnDutyTableLevel1:header:3">Default Devices</th><th id="WhoIsOnDutyTableLevel1:header:4">Supervisors</th> 

</tr> 
</thead> 
<tfoot></tfoot> 
<tbody> 
<tr> 
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/GroupDetails.do;jsessionid=17gaw4aw5pv8s?_data=TJZuNquzHUgWcre8AVcKpAFRUsezgPKzbHn7hwtTf9Ei0C2PJ8QYcKIy8OkorCWT8HDTAzkon1ls%0D%0AefuHC1N%2F0SLQLY8nxBhwesdd7Zeg6NzvCfuzRqLg5g%3D%3D" name="team1" id="team1" class="details">Team 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2" class="centered-text">US/Pacific</td><td headers="WhoIsOnDutyTableLevel1:header:3" class="centered-text"><img src="/static/images/icon_boolean_false.png" alt="No" border="0"></td><td headers="WhoIsOnDutyTableLevel1:header:4"> 
<values> 
</values><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z7AnuRhH67H6AixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="mgr1" id="mgr1" class="details">Mgr 1</a> 
<br> 








</td> 
</tr> 
<tr> 
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="4"> 
<table class="duty-report-level3" id="WhoIsOnDutyTableLevel3"> 
<caption></caption> 
<thead> 
<tr> 
<th id="WhoIsOnDutyTableLevel1:header:1" class="th-left">a</th><th id="WhoIsOnDutyTableLevel1:header:2" class="">b</th> 
</tr> 
</thead> 

<tfoot></tfoot> 
<tbody> 
<tr> 
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="2"> 
<table class="duty-report-level4" id="WhoIsOnDutyTableLevel4"> 
<caption></caption> 
<thead> 
<tr> 
<th id="WhoIsOnDutyTableLevel1:header:1">Recipient</th><th id="WhoIsOnDutyTableLevel1:header:2">Category</th><th id="WhoIsOnDutyTableLevel1:header:3">Escalation</th> 
</tr> 
</thead> 
<tfoot></tfoot> 
<tbody> 
<tr id="208894"> 

<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">0</td> 
</tr> 
<tr id="207792"> 
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">5</td> 
</tr> 




</tbody> 
</table> 

</td> 
</tr> 
</tbody> 
</table> 
</td> 
</tr> 
</tbody> 
</table> 
</td> 
</tr> 
</tbody> 
</table> 
+0

对不起,我期待看到打印出的​​标签内的数据。 – sybind 2012-01-12 19:42:17

回答

5

你需要一个小的改动你的XPath:

doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row| 
    # puts "row= " + row.to_s 
    row.xpath('./td').each do |td| 
    puts "td= " + td.text 
    end 
end 

,输出:

 
td= User 1 
td= PERSON 
td= 0 
td= User 2 
td= PERSON 
td= 5 

使用./td作为td中的XPath基本上意味着“从这一点看下来一个“。

就个人而言,除非你绝对需要XPath,否则我推荐使用CSS访问器。他们更可读,而且往往要简单得多:

doc.search('#WhoIsOnDutyTableLevel4 tbody tr').each do |row| 
    row.search('td').each do |td| 
    puts "td= " + td.text 
    end 
end 

我建议使用search代替cssxpathat而不是at_cssat_xpath。当你选择另一个时,没有真正的魔法发生,你只需要记住两种不同的方法。

+0

非常感谢。这促使我坚果 – sybind 2012-01-12 20:20:32

+0

解析XML/HTML需要一点时间,但一旦这样做,就很容易分解页面和XML数据。 XPath非常强大,但对我来说看起来像线噪声,这就是为什么我更喜欢CSS。 – 2012-01-12 21:10:46

0

内部循环中的XPath表达式相对于每个tr进行计算,因此您想要使用td(其选择儿童上下文trtd元素)和不*/td(其选择孙子td元件)。

全码:

doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row| 
    puts "row= " + row.to_s 
    row.xpath('td').each do |td| 
     puts "td= " + td 
    end 
end