使用Nokogiri的CSS选择

我正在尝试使用Nokogiri的HTML抓取，但没有得到预期的结果。使用Nokogiri的CSS选择

在这个特定的URL上，我正在查看特定位置的交易，并希望在该页面上显示交易详情。 .small-deals-cont是页面的CSS选择器，同样.deal-title是交易标题的CSS选择器。

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 

url = "http://www.snapdeal.com/local-deals-Chennai-all?category=all&HID=dealHeader_all" 

doc =Nokogiri::HTML(open(url)) 

puts doc.at_css("title").text 

doc.css(".small-deals-cont").each do |item| 
    puts item.at_css(".deal-title") 
end

来源

2012-09-03 Anu11

为了防止抓取，他们可能会在初始页面加载后（使用javascript）加载内容。在这种情况下Nokogiri不会帮助你，你需要一个更精细的系统 - 可能使用mechanize。

但是，最后，你不应该刮。本网站的所有者已经采取了防止它的方法，您应该尊重这一点。检查一个API。

来源

2012-09-03 14:48:12

+1使用API推荐。 Mechanize对JavaScript没有帮助，因为它不是JavaScript解释器。如果需要刮Wa，Watir或其衍生产品之一会更好。 –

引入nokogiri实际工作，这和我们不需要使用机械化的this.Here是它的代码：

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 
require 'csv' 

hotel= Array.new 

cuisine=Array.new 

url= "http://www.abcd.com" 

1.upto(5) do |page_num| 
    doc = Nokogiri::HTML(open("http://www.abcd.com/cit/restaurants?page=#{page_num}")) 
    puts doc.at_css("title").text 

    doc.css("article").each do |item| 
    hotel << item.at_css("a").text 
    cuisine << item.at_css(".tags").text 
    end 
end 

@hotel=hotel 
@cuisine=cuisine 

([email protected] - 1).each do|index| 

    puts "Hotel: #{@hotel[index]}" 
    puts "Cuisine: #{@cuisine[index]}" 
    puts " " 

end 


CSV.open("output2.csv", "wb") do |row| 

    row << ["Hotel", "Cuisine"] 

    ([email protected] - 1).each do |index| 
    row << [@hotel[index], @cuisine[index]] 
    end 

end

来源

2012-09-07 11:22:28 Anu11

使用Nokogiri的CSS选择

回答

相关问题