：在Nokogiri有CSS伪类

我在寻找伪类:has的Nokogiri。它应该像jQuery的has selector一样工作。：在Nokogiri有CSS伪类

例如：

<li><h1><a href="dfd">ex1</a></h1><span class="string">sdfsdf</span></li> 
<li><h1><a href="dsfsdf">ex2</a></h1><span class="string"></span></li> 
<li><h1><a href="sdfd">ex3</a></h1></li>

的CSS选择器应该只返回第一个环节，一个与不空span.string兄弟。

在jQuery的这个选择效果很好：

$('li:has(span.string:not(:empty))>h1>a')

但不是在引入nokogiri：

Nokogiri::HTML(html_source).css('li:has(span.string:not(:empty))>h1>a')

:not和:empty效果很好，但不是:has。

是否有在引入nokogiri CSS选择任何文件？
也许有人可以写一个自定义:has伪类？这里是一个example如何编写一个:regexp选择器。
（可选）我可以使用XPath。如何为li:has(span.string:not(:empty))>h1>a编写XPath？

来源

2012-08-01 rogal111

'：has' pseudo是一个jQuery扩展，所以我猜Nokogiri不支持它，因为它不是任何标准的一部分。 – 2012-08-01 13:22:24

好的，所以我最近的3个问题都适合 – rogal111 2012-08-01 13:26:42

根据你提供的HTML，// li [span [@ class =“string”] [count（node（））> 0]]/h1/a'节点与'ex1'内容（第一个“a”）。 – 2012-08-01 13:42:24

的problem with Nokogiri's current implementation of :has()是，它创造的XPath需要的内容是直接孩子，没有任何后代：

puts Nokogiri::CSS.xpath_for("a:has(b)") 
#=> "//a[b]" 
#=> Should output "//a[.//b]" to be correct

为了让这个XPath匹配的jQuery做什么，你需要允许span是一个后代元素。例如：

require 'nokogiri' 
d = Nokogiri.XML('<r><a/><a><b><c/></b></a></r>') 
d.at_css('a:has(b)') #=> #<Nokogiri::XML::Element:0x14dd608 name="a" children=[#<Nokogiri::XML::Element:0x14dd3e0 name="b" children=[#<Nokogiri::XML::Element:0x14dd20c name="c">]>]> 
d.at_css('a:has(c)') #=> nil 
d.at_xpath('//a[.//c]') #=> #<Nokogiri::XML::Element:0x14dd608 name="a" children=[#<Nokogiri::XML::Element:0x14dd3e0 name="b" children=[#<Nokogiri::XML::Element:0x14dd20c name="c">]>]>

为了您的具体情况，下面是完整的 “破” 的XPath：

puts Nokogiri::CSS.xpath_for("li:has(span.string:not(:empty)) > h1 > a") 
#=> //li[span[contains(concat(' ', @class, ' '), ' string ') and not(not(node()))]]/h1/a

这里，它是固定的：

# Adding just the .// 
//li[.//span[contains(concat(' ', @class, ' '), ' string ') and not(not(node()))]]/h1/a 

# Simplified to assume only one CSS class is present on the span 
//li[.//span[@class='string' and not(not(node()))]]/h1/a 

# Assuming that `not(:empty)` really meant "Has some text in it" 
//li[.//span[@class='string' and text()]]/h1/a 

# ..or maybe you really wanted "Has some text anywhere underneath" 
//li[.//span[@class='string' and .//text()]]/h1/a 

# ..or maybe you really wanted "Has at least one element child" 
//li[.//span[@class='string' and *]]/h1/a

来源

2012-08-01 17:36:48 Phrogz

引入nokogiri没有:has选择，这里是文档上什么确实做： http://ruby.bastardsbook.com/chapters/html-parsing/#h-2-2

来源

2012-08-01 13:39:18 Austin

在你的链接中，只有一些例子如何使用它，但不是nokogiri css选择器文档。 – rogal111 2012-08-01 13:42:39

这是最具描述性和解释性的，如果你想看看我的意思，请看[Nokogiri's Documentation]（http://nokogiri.org/Nokogiri/CSS.html） – Austin 2012-08-01 13:44:48

好吧，我找到了解决办法，也许将是有用的人。

自定义伪类:custom_has：

class MyCustomSelectors 
    def custom_has node_set, selector 
     node_set.find_all { |node| node.css(selector).present? } 
    end 
end 

#usage: 
doc.css('li:custom_has(span.string:not(:empty))>h1>a',MyCustomSelectors.new)

为什么我declar :custom_has不只是:has？因为它已经宣布。在Nokogiri回购股票tests为:has选择器，但他们不工作。 I reported this issue给作者。

来源

2012-08-01 13:59:52 rogal111

引入nokogiri允许链接.css()和.xpath()调用同一个对象。因此，无论何时您想使用:has，只需结束当前的.css()呼叫并添加.xpath(..)（父母选择器）即可。您甚至可以通过另一个.css()呼叫恢复您的选择，从您的xpath()停止！

例子：

下面是来自维基百科的一些HTML：

<tr> 
    <th scope="row" style="text-align:left;"> 
     Origin 
    </th> 
    <td> 
     <a href="/wiki/Edinburgh" title="Edinburgh">Edinburgh</a> 
     <a href="/wiki/Scotland" title="Scotland">Scotland</a> 
    </td> 
</tr> 
<tr> 
    <th scope="row" style="text-align:left;"> 
     <a href="/wiki/Music_genre" title="Music genre">Genres</a> 
    </th> 
    <td> 
     <a href="/wiki/Electronica" title="Electronica">Electronica</a> 
     <a href="/wiki/Intelligent_dance_music" title="Intelligent dance music">IDM</a> 
     <a href="/wiki/Ambient_music" title="Ambient music">ambient</a> 
     <a href="/wiki/Downtempo" title="Downtempo">downtempo</a> 
     <a href="/wiki/Trip_hop" title="Trip hop">trip hop</a> 
    </td> 
</tr> 
<tr> 
    <th scope="row" style="text-align:left;"> 
     <a href="/wiki/Record_label" title="Record label">Labels</a> 
    </th> 
    <td> 
     <a href="/wiki/Warp_(record_label)" title="Warp (record label)">Warp</a> 
     <a href="/wiki/Skam_Records" title="Skam Records">Skam</a> 
     <a href="/wiki/Music70" title="Music70">Music70</a> 
    </td> 
</tr>

说你要选择所有<a>元素自带含href="/Music_genre"链接之后<th>第一<td>内。

@artistPage.css("table th > a[href='/wiki/Music_genre']").xpath("..").css("+ td a")

这将返回所有<a>的为每个流派上市。

现在为了好的措施，让我们抓住所有这些<a>的内部文本并将它们放入一个数组中。

@genreLinks = @artistPage.css("table th > a[href='/wiki/Music_genre']").xpath("..").css("+ td a") 
@genres = [] 
@genreLinks.each do |genreLink| 
    @genres.push(genreLink.text) 
end

来源

2013-10-14 18:41:17 musophob

：在Nokogiri有CSS伪类

回答

相关问题