2011-06-03 54 views

回答

5

此代码会给你整个文档的纯无格式的文本:

require 'mechanize' 
require 'nokogiri' 

rational = Mechanize.new { |agent| 
    agent.user_agent_alias = 'Windows Mozilla' 
} 

document = Nokogiri::HTML(rational.get(ARGV[0]).content) 

#This will give you very dirty result 
#results = document.inner_text 

#My suggestion is to extract text from some specific element 
results = document.css("#content .my-element-with-some-contents").inner_text 
+0

很好地工作。谢谢。我认为我可以在机械化对象上使用Nokogiri方法.... – Radek 2011-06-03 07:11:01

+0

机械化基于Nokogiri,所以我认为你是对的! – 2011-06-03 09:12:26

+2

不需要解析响应,你可以像写'rational.get(link); rational.page.at( '/ HTML /体/ H1')。text' – taro 2011-06-03 14:38:13