我有一个程序,使用或者正在运行的程序,而作为一个参数一个关键词或关键词搜索谷歌:机械化刮谷歌的网址
例如:pull_sites.rb "testing"
回报这些网站>>>
https://en.wikipedia.org/wiki/Software_testing
http://en.wikipedia.org/wiki/Test_automation
http://www.istqb.org/about-istqb.html
http://softwaretestingfundamentals.com/test-plan/
https://en.wikipedia.org/wiki/Software_testing
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:9qU2GDLzZzEJ:https://en.wikipedia.org/wiki/Software_testing%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://en.wikipedia.org/wiki/Test_strategy
https://en.wikipedia.org/wiki/Category:Software_testing
https://en.wikipedia.org/wiki/Test_automation
https://en.wikipedia.org/wiki/Portal:Software_testing
https://en.wikipedia.org/wiki/Test
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:R94CAo00wOYJ:https://en.wikipedia.org/wiki/Test%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://en.wikipedia.org/wiki/Unit_testing
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:G9V8uRLkPjIJ:https://en.wikipedia.org/wiki/Unit_testing%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://testing.byu.edu/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:d9bGrCHr9fsJ:https://testing.byu.edu/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://www.test.com/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:S92tylTr1V8J:https://www.test.com/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
http://ddce.utexas.edu/disability/using-testing-accommodations/
http://blogs.vmware.com/virtualblocks/2015/07/06/vsan-vs-nutanix-head-to-head-performance-testing-part-4-exchange/
http://www.networkforgood.com/nonprofitblog/testing-101-4-steps-optimizing-your-fundraising-approach/
http://www.auslea.com/software-testing-training.html
http://academy.littletonpublicschools.net/Default.aspx%3Ftabid%3D12807%26articleType%3DArticleView%26articleId%3D2400
https://golang.org/pkg/testing/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:EALG7Jlm9eoJ:https://golang.org/pkg/testing/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
http://www.speedtest.net/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:M47_v0xF3m8J:http://www.speedtest.net/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://www.act.org/content/act/en/products-and-services/the-act/taking-the-test.html
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:1sMSoJBXydoJ:https://www.act.org/content/act/en/products-and-services/the-act/taking-the-test.html%252Btesting%26gbv%3D1%26%26ct%3Dclnk
http://www.act.org/content/act/en/products-and-services/the-act/test-preparation.html
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:pAzlNJl3YY4J:http://www.act.org/content/act/en/products-and-services/the-act/test-preparation.html%252Btesting%26gbv%3D1%26%26ct%3Dclnk
它按预期的方式工作,但只是刮了谷歌的第一页,是否有可能搜索说第1-5页?
这里是刮源:如果您使用的是谷歌Chrome或Firefox,开辟了开发工具
def get_urls
puts "Searching...".green
agent = Mechanize.new
page = agent.get('http://www.google.com/')
google_form = page.form('f')
google_form.q = "#{SEARCH}" #SEARCH is the parameter given when program is run
page = agent.submit(google_form, google_form.buttons.first)
page.links.each do |link|
if link.href.to_s =~/url.q/
str=link.href.to_s
strList=str.split(%r{=|&})
url=strList[1]
File.open("links.txt", "a+"){ |s| s.puts(url) }
end
end
end
是的,它是可能的。您是否尝试点击或导航到其他网页? – kjprice
@kjprice如何在程序已经运行时点击并导航到程序中的另一个页面?问题是否可以在程序中搜索页面,而不是如果我可以单击2,3或4 .. – 13aal
@ 13aal是的,您可以告诉机械化在点击底部的页面链接后点击底部的页面链接页面,然后刮那些页面等。这就是你要求怎么做? – bkunzi01