2013-05-02 88 views
0

我有一个功能,登录到一个网站,并搜索下一页中的字符串。这个过程目前需要10秒钟,但是想看看我能做些什么来加速它。我想知道是否有可能让curl登录持续在客户端会话或者可能更好地搜索文档。加速cURL页面登录和刮

public function curlLogin($url, $post_values, $cookieJar) { 

     $timeout = 30; 

     $curl_connection = curl_init(); 
     curl_setopt($curl_connection, CURLOPT_URL, $url); 
     curl_setopt($curl_connection, CURLOPT_TIMEOUT, $timeout); 
     curl_setopt($curl_connection, CURLOPT_USERAGENT,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"); 
     curl_setopt($curl_connection, CURLOPT_COOKIEJAR, $cookieJar); 
     curl_setopt($curl_connection, CURLOPT_COOKIEFILE, $cookieJar); 
     curl_setopt($curl_connection, CURLOPT_COOKIESESSION, 0); 
     curl_setopt($curl_connection, CURLOPT_HEADER, 1); 
     curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, 1); 
     curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, 0); 
     curl_setopt($curl_connection, CURLOPT_POST, 1); 
     curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_values); 
     curl_setopt($curl_connection, CURLOPT_HTTPHEADER, 
     array("Content-type: application/x-www-form-urlencoded")); 
     curl_exec($curl_connection); 
     return $curl_connection; 

    } 

    public function curlPost($curl_connection, $url, $post_values, $cookieJar) { 

     $timeout = 30; 

     curl_setopt($curl_connection, CURLOPT_URL, $url); 
     curl_setopt($curl_connection, CURLOPT_TIMEOUT, $timeout); 
     curl_setopt($curl_connection, CURLOPT_USERAGENT,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"); 
     curl_setopt($curl_connection, CURLOPT_COOKIEJAR, $cookieJar); 
     curl_setopt($curl_connection, CURLOPT_COOKIEFILE, $cookieJar); 
     curl_setopt($curl_connection, CURLOPT_COOKIESESSION, 0); 
     curl_setopt($curl_connection, CURLOPT_HEADER, 1); 
     curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, 1); 
     curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, 0); 
     curl_setopt($curl_connection, CURLOPT_POST, 1); 
     curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_values); 
     curl_setopt($curl_connection, CURLOPT_HTTPHEADER, 
     array("Content-type: application/x-www-form-urlencoded")); 
     $result = curl_exec($curl_connection); 
     return $result; 

    } 

$cookieJar = tempnam ("/tmp", "CURLCOOKIE"); 

$curl_connection = $this->curlLogin($login_url, $post_values, $cookieJar); 

$result = $this->curlPost($curl_connection, $next_url, $params, $cookieJar); 

if (strpos($result,'string 1') > 0) { 
    $success = true; 
    $message = 'string 1 is present'; 
}else if (strpos($result,'string 2') > 0){ 
    $success = false; 
    $message = 'string 2 is present'; 
}else if (strpos($result,'string 3') > 0){ 
    $success = false; 
    $message = 'string 3 is present'; 
}else{ 
    $success = false; 
    $message = 'None of the above strings are present.'; 
} 

curl_close($curl_connection); 
unlink($cookieJar); 
+0

可能的重复的[php - 最快的方式来检查存在的文本在很多领域(1000以上)](http://stackoverflow.com/questions/12891689/php-fastest-way-to-check-presence-of - 如何在PHP中的curl请求期间防止服务器重载)(http://stackoverflow.com/questions/13461194/how-to-prevent-server-from-text-in-many-domains-above-1000) overload-curl-requests-in-php/13461652),[php从url获取所有图像,宽度和高度> = 200更快](http://stackoverflow.com/a/10036599/1226894) – Baba 2013-05-02 18:07:16

+0

当你用firebug在firefox中加载这些页面,你的页面加载时间表示什么? – Zak 2013-05-02 18:07:32

+0

搜索子串与其他搜索相比非常快,通过查找这个方向你不会获得任何东西。 – mzedeler 2013-05-02 18:07:43

回答

2

您可以通过重新使用cookiejar来避免每次都登录。

在包含脚本的目录中创建一个名为cookies.txt的文件,并指定: $cookieJar = 'cookies.txt'

运行该脚本的第一次后,只需删除调用curlLogin()功能,您curlPost()功能应该正确使用cookies和,如果你是登录返回数据。

记住,CURLOPT_COOKIEFILE是指定从“0123”和“CURLOPT_COOKIEJAR”中“读取”cookies是您希望写入响应cookie的位置。

因此,您可能在curlPost()函数中没有CURLOPT_COOKIEJAR