为什么我在搜索表单中搜索“”空间时没有在[本] [1]页获取产品我只能看到菜单,而没有搜索结果产品搜索提交后机械手未加载完整网页
Ruby代码:
require 'nokogiri'
require 'mysql2'
require 'logger'
require 'mechanize'
agent = Mechanize.new{|a| a.log = Logger.new(STDERR) }
agent.user_agent_alias = 'Windows Mozilla'
agent.read_timeout = 60
def add_cookie(agent, uri, cookie)
uri = URI.parse(uri)
Mechanize::Cookie.parse(uri, cookie) do |cookie|
agent.cookie_jar.add(uri, cookie)
end
end
login_page = agent.get "http://www.example.com.mx/login.php?location=%2F"
login_form = login_page.form_with(:method => 'POST')
email_field = login_form.field_with(name: "correo_ingresar")
password_field = login_form.field_with(name: "password")
email_field.value = '[email protected]'
password_field.value = 'password'
home_page = login_form.submit
myarray = home_page.body.scan(/SetCookie\(\"(.+)\", \"(.+)\"\)/)
myarray.each{|line| add_cookie agent, 'http://www.example.com.mx', "#{line[0]}=#{line[1]}"}
add_cookie(agent, 'http://www.example.com.mx', "forzar_existencias=1; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "articulos_mostrar=50; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "forz_existencias=1=; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "no_actualiza=1; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "orden_mostrar=8; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "page=1; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "precio_inicio=0; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "location=%2Farticulos.php%3Fbuscar%3D%2B; path=/; domain=www.example.com.mx")
search_form = home_page.forms.first
search_field = search_form.field_with(name: "buscar")
search_field.value = ' '
search_results = search_form.submit
resultados = 'http://example.com.mx/articulos.php?buscar=+'
我下载了直播HTTP头插件用于Firefox与萤火虫。当我填充一个空格并单击[网页] [1]上的搜索按钮时,我会在实时HTTP标头中获得以下结果。
http://example.com.mx/articulos.php?buscar=+
GET /articulos.php?buscar=+ HTTP/1.1
Host: example.com.mx
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://example.com.mx/articulos.php?buscar=+
Cookie: _ga=GA1.3.162897808.1438611502; _gat=1
Connection: keep-alive
HTTP/1.1 200 OK
Date: Sat, 08 Aug 2015 04:29:40 GMT
Server: Apache
x-powered-by: PHP/5.4.30
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html
----------------------------------------------------------
http://www.google-analytics.com/collect?v=1&_v=j37&a=1988602157&t=pageview&_s=1&dl=http%3A%2F%2Fexample.com.mx%2Farticulos.php%3Fbuscar%3D%2B&ul=en-us&de=UTF-8&dt=Sistemas%20Aplicados&sd=24-bit&sr=1920x1080&vp=1903x969&je=0&_u=AACAAEABI~&jid=&cid=162897808.1438611502&tid=UA-58813310-1&z=90642832
GET /collect?v=1&_v=j37&a=1988602157&t=pageview&_s=1&dl=http%3A%2F%2Fexample.com.mx%2Farticulos.php%3Fbuscar%3D%2B&ul=en-us&de=UTF-8&dt=Sistemas%20Aplicados&sd=24-bit&sr=1920x1080&vp=1903x969&je=0&_u=AACAAEABI~&jid=&cid=162897808.1438611502&tid=UA-58813310-1&z=90642832 HTTP/1.1
Host: www.google-analytics.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://example.com.mx/articulos.php?buscar=+
Connection: keep-alive
HTTP/1.1 200 OK
Pragma: no-cache
Expires: Mon, 07 Aug 1995 23:30:00 GMT
Access-Control-Allow-Origin: *
Last-Modified: Sun, 17 May 1998 03:00:00 GMT
x-content-type-options: nosniff
Content-Type: image/gif
Date: Wed, 29 Jul 2015 12:33:33 GMT
Server: Golfe2
Content-Length: 35
Age: 834969
Alternate-Protocol: 80:quic,p=0
Cache-Control: private, no-cache, no-cache=Set-Cookie, proxy-revalidate
----------------------------------------------------------
http://example.com.mx/resultados.php
POST /resultados.php HTTP/1.1
Host: example.com.mx
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: http://example.com.mx/articulos.php?buscar=+
Content-Length: 204
Cookie: _ga=GA1.3.162897808.1438611502; _gat=1
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
opcion=&buscar=+&page=1&articulos_mostrar=10&orden_mostrar=1&seccion=&linea=&sublinea=&forz_existencias=1&precio_inicio=0&precio_final=20000&location=%252Farticulos.php%253Fbuscar%253D%252B&no_actualiza=1
HTTP/1.1 200 OK
Date: Sat, 08 Aug 2015 04:29:42 GMT
Server: Apache
x-powered-by: PHP/5.4.30
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html
----------------------------------------------------------
的问题是:我怎么得到充分的产品展示在网页上,这样我就可以开始刮,如果它有一个引荐链接,它不会自动得到产品。 [这] [2]是生成的HTML:
上一级:你真正感兴趣的是什么?产品价格?自动订购东西? – Felix
我对获得全部产品感兴趣 – ingalcala