我正在创建一个Google Dork工具,它将一个URL编码查询发送到google.com,并将结果作为链接数组返回。Ruby谷歌请求失败
#!/usr/bin/env ruby
require 'cgi'
require 'socket'
# define full path to library
cwd = File.expand_path(File.dirname(__FILE__))
lib = File.join(cwd, "lib")
# require project library files
Dir.new(lib).each do |x|
next unless x[/\.rb$/]
begin
require File.join(lib, x)
rescue
raise LoadError, "Failed to load #{x}."
end
end
# build the google dork
def query(ext, site, inurl, intitle, intext)
query, values = "", []
dorks = %w(ext site inurl intitle intext)
values.push(ext, site, inurl, intitle, intext)
j = 0
values.each do |i|
dork = dorks[j]
if dork.match(/^in/)
value = %Q("#{i}")
else
value = i
end
query += "#{dork}:#{value} " unless i.nil?
j += 1
end
query
end
# sends the search query to google.com
def search(host, query, agent)
sock, links = TCPSocket.new(host, 80), []
query = CGI::escape(query).chop
request = "GET /search?q=#{query} HTTP/1.0\r\n\r\n"# HTTP/1.0\r\nUser-Agent: #{agent}\r\nConnection: Close\r\n\r\n"
sock.puts request
response = sock.read
body = response.split("\r\n\r\n", 2)[1]
body.split("url?q=").each do |link|
link = link.to_s.split("&", 0)[0]
links << link if link.match(/^http|^https/) and link !~ /^http:\/\/webcache/
end
links
end
agent = RandomAgent.new
host = "google.com"
q = query(ARGV[0], ARGV[1], ARGV[2], ARGV[3], ARGV[4])
puts search(host, q, agent.randomize)
由于某些原因,我还没有弄清楚,如果我手动发送请求,它的工作原理。但是,如果我使用ruby脚本发送它,它将返回一个302错误。例如:
GET /search?q=ext%3Apdf+site%3Agithub.com+inurl%3A%22email%22 HTTP/1.0
这是我的脚本所产生的请求。但是,当使用脚本时,我得到一个HTTP 302错误。如果我使用nc手动发送相同的请求,则返回结果。
NC google.com 80
GET /search?q=ext%3Apdf+site%3Agithub.com+inurl%3A%22email%22 HTTP/1.0
论的顶,如果我只发送这样的:
GET /search?q=ext%3Apdf+site%3Agithub.com HTTP/1.0
它的工作原理。第三个参数导致它出于某种原因而出现问题。我似乎无法弄清楚。谢谢。