2016-02-05 315 views
2

我解析一个html页面,并有一个长的CSS选择器(我找不到一个较短的,因为该页面是愚蠢的)。它应该选择表中的所有tr,但只选择第二行......我错过了什么?CSS选择器只选择第一行

的选择:

body > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(3) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(8) > td:nth-child(1) > table:nth-child(4) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) tr:not(:first-child) 

页有对方内线多个表,但前90%甚至没有事,选择我要使用该表后,我跟随了一个“[space]tr:not(...) “,所以它应该选择所有的降行,不是吗?

HTML网页示例(不能链接它,您需要登录访问):选择成功选择我想要的表(在选择...> tbody:nth-child(1) tr:not(:first-child)http://pastebin.com/gprXTvzz

后,年龄看起来是这样的:

<tbody> 
    <tr valign="bottom"> 
     <td class="blackmedium" width="80"><b>Part Number</b></td> 
     <td class="blackmedium" width="100"><b>Manufacturer</b></td> 
     <td class="blackmedium" width="40"><b>Abbr.</b></td> 
     <td class="blackmedium" width="50"><b>WIX Part Number</b></td> 
     <td class="blackmedium" width="50"><b>Lead Time</b></td> 
    </tr> 
    <tr> 
     <td class="blackmedium" width="80">A0002701098</td> 
     <td class="blackmedium" width="100">MERCEDES-BENZ</td> 
     <td class="blackmedium" width="40">MBZ</td> 
     <td class="blackmedium" width="50"> <a href="http://www.wixindustrialfilters.com/cross.aspx?Part=W03AT780" target="_blank">W03AT780</a> 
     </td> 
     <td class="blackmedium" width="50"> 
     STOCK 
     </td> 
    </tr> 
    <tr bgcolor="#e0e0e0"> 
     <td class="blackmedium" width="80">A0002701598 Discontinued</td> 
     <td class="blackmedium" width="100">MERCEDES-BENZ</td> 
     <td class="blackmedium" width="40">MBZ</td> 
     <td class="blackmedium" width="50"> <a href="javascript:var w=window.open('PartDetail.asp?Part=58892','PartDetail','left=200,top=200,width=530,height=500,toolbar=no,location=no,directories=no,status=no,menubar=no,resizable=yes,scrollbars=yes');w.focus();">58892</a> 
     </td> 
     <td class="blackmedium" width="50"> 
     </td> 
    </tr> 
    <tr> 
     <td class="blackmedium" width="80">A0002772395</td> 
     <td class="blackmedium" width="100">MERCEDES-BENZ</td> 
     <td class="blackmedium" width="40">MBZ</td> 
     <td class="blackmedium" width="50"> <a href="javascript:var w=window.open('PartDetail.asp?Part=51249','PartDetail','left=200,top=200,width=530,height=500,toolbar=no,location=no,directories=no,status=no,menubar=no,resizable=yes,scrollbars=yes');w.focus();">51249</a> 
     </td> 
     <td class="blackmedium" width="50"> 
     </td> 
    </tr> 
    <tr bgcolor="#e0e0e0"> 
     <td class="blackmedium" width="80">A0002772895</td> 
     <td class="blackmedium" width="100">MERCEDES-BENZ</td> 
     <td class="blackmedium" width="40">MBZ</td> 
     <td class="blackmedium" width="50"> <a href="javascript:var w=window.open('PartDetail.asp?Part=57701','PartDetail','left=200,top=200,width=530,height=500,toolbar=no,location=no,directories=no,status=no,menubar=no,resizable=yes,scrollbars=yes');w.focus();">57701</a> 
     </td> 
     <td class="blackmedium" width="50"> 
     </td> 
    </tr> 
</tbody> 

回答

1

body > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(3) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(8) > td:nth-child(1) > table:nth-child(4) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) tr:not(:first-child)

不完全回答你的问题,但如果将M arkup不解析友好,我需要找到一个深深嵌套在可怕的标记table元素中,我更喜欢通过找到它存在的特定标题。在这种情况下,您可以找到具有Part Number标题的表格。实例的XPath:

//table[tr[1]/td/b = "Part Number"] 

接着,在该表中,可以使用"not first child" CSS选择器:

tr:not(:first-child) 

或者,您也可以使用adjacent selector(找到tr元素之后tr元素,这在逻辑上排除第一行):

tr + tr 

希望这会简化一些事情。

+0

我无法使用xpath,但是我通过先获取所有表,然后知道我需要哪个索引来解决它,然后在下一个语句中选择所有tr元素。你的也应该工作。 (使用jSoup) – appl3r