2011-08-20 100 views
1

我正在使用Flex 4框架在Adobe AIR中创建Google scraper。
我遇到了一面砖墙:Google在阅读大约10页后强制验证码。Flex通过代理服务器发送请求

谁能告诉我如何通过代理服务器获取页面?

我使用HTTPService
这里是我的代码:

service=new HTTPService(); 
service.addEventListener(ResultEvent.RESULT, googleResult); 
service.addEventListener(FaultEvent.FAULT, googleFault); 
service.resultFormat="text"; 
service.url=_googleURL+keyPhrase.text 
service.send(); 

干杯,

回答

1

解决方案:

我创建了延伸ProxyHTTPService类的HTTPService

package com.pageone.proxyserv { 

    import mx.rpc.AsyncToken; 
    import mx.rpc.http.mxml.HTTPService; 
    import mx.utils.URLUtil; 

    public class ProxyHTTPService extends HTTPService { 
     private var _finalURL:String; 

     private var _tempURL:String; 

     private var _proxy:Object; 

     private var phpProxyURL:String="http://myserver/proxy.php"; 

     public function ProxyHTTPService(rootURL:String="") { 
      super(); 

     } 

     public function get proxy():Object 
     { 
      return _proxy; 
     } 

     public function set proxy(value:Object):void 
     { 
      _proxy = value; 
     } 


     public function get finalURL():String { 
      return _finalURL; 
     } 

     public function set finalURL(value:String):void { 
      _finalURL=value; 
     } 

     override public function send(parameters:Object=null):AsyncToken { 
      this.url=phpProxyURL; 

      var proxyargs:Object=new Object(); 
      proxyargs.proxy=_proxy.ip + ":" + _proxy.port; 

      _tempURL=_finalURL; 
      var params:String=URLUtil.objectToString(parameters, "&");; 
      if(_finalURL.indexOf("?") > 0) { 
       _tempURL += "&" + params; 
      } else { 
       _tempURL += "?" + params; 
      } 
      _tempURL=encodeURI(_tempURL); 
      _tempURL=replaceAll(_tempURL, "%253A", ":"); 
      _tempURL=replaceAll(_tempURL, "%252F", "/"); 

      proxyargs.url=_tempURL; 

      return super.send(proxyargs); 
     } 

     private function replaceAll(string:String, find:String, replace:String):String { 
      return string.split(find).join(replace); 
     } 
    } 
} 

然后我科瑞编辑服务器

<?php 

$url = $_GET["url"] or die("require url parameter"); 
$proxyuri = $_GET["proxy"] or die("require proxy parameter"); 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL,$url); 
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 0); 
curl_setopt($ch, CURLOPT_PROXY, $proxyuri); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 GTB7.1'); 

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 2); 
curl_setopt($ch, CURLOPT_TIMEOUT, 60); 
curl_setopt ($ch, CURLOPT_HEADER, 0); 
$exec=curl_exec ($ch); 

curl_close($ch); 


?> 

上一个PHP页面现在在ActionScript中,你可以这样调用的ProxyHTTPService:

var p:ProxyHTTPService=new ProxyHTTPService; 
p.addEventListenet(ResultEvent.RESULT, resultListener); 
p.addEventListenet(FaultEvent.FAULT, faultListener); 
p.finalURL="http://www.google.com/search"; 
p.proxy={ip: "xxx.xxx.xxx.xxx", port:8080}; 
p.send({q: "StackOverflow"}); 
相关问题