2010-08-19 88 views
121

我试图让curl遵循重定向,但我无法完全正确地工作。我有一个字符串,我想将其作为GET参数发送到服务器并获取生成的URL。如何找到我将使用卷曲重定向的位置?

例子:

字符串= 狗头害虫
URL = www.wowhead.com/search?q=Kobold+Worker

如果你去那个网址它会将您重定向到“www.wowhead.com/npc=257”。我想让curl将这个URL返回给我的PHP代码,这样我就可以提取“npc = 257”并使用它。

当前代码:

function npcID($name) { 
    $urltopost = "http://www.wowhead.com/search?q=" . $name; 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"); 
    curl_setopt($ch, CURLOPT_URL, $urltopost); 
    curl_setopt($ch, CURLOPT_REFERER, "http://www.wowhead.com"); 
    curl_setopt($ch, CURLOPT_HTTPHEADER, Array("Content-Type:application/x-www-form-urlencoded")); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); 
    return curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); 
} 

然而,这将返回www.wowhead.com/search?q=Kobold+Worker而不是www.wowhead.com/npc=257

我怀疑在外部重定向发生之前PHP会返回。我怎样才能解决这个问题?

+6

这是“卷曲跟随重定向”的主要问题之一。要使用'curl'命令自动跟踪重定向,请传递'-L'或'--location'标志。例如。 'curl -L http:// example.com /' – 2013-09-09 19:09:15

回答

214

为了使卷曲遵循重定向,使用:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 

呃......我不认为你实际上是在执行卷曲...尝试:

curl_exec($ch);

...设置选项后,并在拨打curl_getinfo()之前。

编辑:如果你只是想找出一个页面重定向到,我会使用的建议here,只是使用curl抢头和提取地点:从他们头:

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_HEADER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
$result = curl_exec($ch); 
if (preg_match('~Location: (.*)~i', $result, $match)) { 
    $location = trim($match[1]); 
} 
+1

这使得php遵循重定向。我不想跟随重定向,我只想知道重定向页面的网址。 – 2010-08-19 08:50:33

+8

噢,所以你实际上并不想抓取页面?只需找出位置?在这种情况下,我建议使用这里的策略:http://zzz.rezo.net/HowTo-Expand-Short-URLs.html - 基本上只需从重定向页面抓取标题,然后获取位置:头从它。无论哪种方式,但你仍然需要为Curl执行exec()来实际执行任何操作...... – 2010-08-19 09:03:28

+4

谢谢,这个工作就像一个魅力:) – 2010-08-19 10:00:32

8

上面的答案在我的一台服务器上不适用于我,这对于basedir来说是有用的,所以我重新对它进行了一些修改。下面的代码适用于我的所有服务器。

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_HEADER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
$a = curl_exec($ch); 
curl_close($ch); 
// the returned headers 
$headers = explode("\n",$a); 
// if there is no redirection this will be the final url 
$redir = $url; 
// loop through the headers and check for a Location: str 
$j = count($headers); 
for($i = 0; $i < $j; $i++){ 
// if we find the Location header strip it and fill the redir var  
if(strpos($headers[$i],"Location:") !== false){ 
     $redir = trim(str_replace("Location:","",$headers[$i])); 
     break; 
    } 
} 
// do whatever you want with the result 
echo redir; 
+0

'Location:'标题并不总是遵循重定向。也请看到一个明确的问题:[curl跟踪位置错误](http://stackoverflow.com/questions/2511410/curl-follow-location-error) – hakre 2013-03-13 09:19:50

4

这里所选择的答案是不错,但其区分大小写,并不能防止相对location:头(其中一些网站做),或实际上可能短语Location:其含量Zillow的网页...(目前确实如此)。

有点草率,但一对夫妇快速编辑,使这个有点聪明是:

function getOriginalURL($url) { 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_URL, $url); 
    curl_setopt($ch, CURLOPT_HEADER, true); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
    $result = curl_exec($ch); 
    $httpStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE); 
    curl_close($ch); 

    // if it's not a redirection (3XX), move along 
    if ($httpStatus < 300 || $httpStatus >= 400) 
     return $url; 

    // look for a location: header to find the target URL 
    if(preg_match('/location: (.*)/i', $result, $r)) { 
     $location = trim($r[1]); 

     // if the location is a relative URL, attempt to make it absolute 
     if (preg_match('/^\/(.*)/', $location)) { 
      $urlParts = parse_url($url); 
      if ($urlParts['scheme']) 
       $baseURL = $urlParts['scheme'].'://'; 

      if ($urlParts['host']) 
       $baseURL .= $urlParts['host']; 

      if ($urlParts['port']) 
       $baseURL .= ':'.$urlParts['port']; 

      return $baseURL.$location; 
     } 

     return $location; 
    } 
    return $url; 
} 

注意,这仍然只去1个重定向深。要深入下去,您实际上需要获取内容并遵循重定向。

4

有时你需要得到HTTP头,但在同一时间,你不想返回这些头。**

这个骨架承担饼干的关怀和使用递归HTTP重定向。此处的主要想法是以避免将HTTP标头返回给客户端代码。

你可以在它上面建立一个非常强大的卷曲类。加入POST功能等

<?php 

class curl { 

    static private $cookie_file   = ''; 
    static private $user_agent    = ''; 
    static private $max_redirects   = 10; 
    static private $followlocation_allowed = true; 

    function __construct() 
    { 
    // set a file to store cookies 
    self::$cookie_file = 'cookies.txt'; 

    // set some general User Agent 
    self::$user_agent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)'; 

    if (! file_exists(self::$cookie_file) || ! is_writable(self::$cookie_file)) 
    { 
     throw new Exception('Cookie file missing or not writable.'); 
    } 

    // check for PHP settings that unfits 
    // correct functioning of CURLOPT_FOLLOWLOCATION 
    if (ini_get('open_basedir') != '' || ini_get('safe_mode') == 'On') 
    { 
     self::$followlocation_allowed = false; 
    }  
    } 

    /** 
    * Main method for GET requests 
    * @param string $url URI to get 
    * @return string  request's body 
    */ 
    static public function get($url) 
    { 
    $process = curl_init($url);  

    self::_set_basic_options($process); 

    // this function is in charge of output request's body 
    // so DO NOT include HTTP headers 
    curl_setopt($process, CURLOPT_HEADER, 0); 

    if (self::$followlocation_allowed) 
    { 
     // if PHP settings allow it use AUTOMATIC REDIRECTION 
     curl_setopt($process, CURLOPT_FOLLOWLOCATION, true); 
     curl_setopt($process, CURLOPT_MAXREDIRS, self::$max_redirects); 
    } 
    else 
    { 
     curl_setopt($process, CURLOPT_FOLLOWLOCATION, false); 
    } 

    $return = curl_exec($process); 

    if ($return === false) 
    { 
     throw new Exception('Curl error: ' . curl_error($process)); 
    } 

    // test for redirection HTTP codes 
    $code = curl_getinfo($process, CURLINFO_HTTP_CODE); 
    if ($code == 301 || $code == 302) 
    { 
     curl_close($process); 

     try 
     { 
     // go to extract new Location URI 
     $location = self::_parse_redirection_header($url); 
     } 
     catch (Exception $e) 
     { 
     throw $e; 
     } 

     // IMPORTANT return 
     return self::get($location); 
    } 

    curl_close($process); 

    return $return; 
    } 

    static function _set_basic_options($process) 
    { 

    curl_setopt($process, CURLOPT_USERAGENT, self::$user_agent); 
    curl_setopt($process, CURLOPT_COOKIEFILE, self::$cookie_file); 
    curl_setopt($process, CURLOPT_COOKIEJAR, self::$cookie_file); 
    curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
    // curl_setopt($process, CURLOPT_VERBOSE, 1); 
    // curl_setopt($process, CURLOPT_SSL_VERIFYHOST, false); 
    // curl_setopt($process, CURLOPT_SSL_VERIFYPEER, false); 
    } 

    static function _parse_redirection_header($url) 
    { 
    $process = curl_init($url);  

    self::_set_basic_options($process); 

    // NOW we need to parse HTTP headers 
    curl_setopt($process, CURLOPT_HEADER, 1); 

    $return = curl_exec($process); 

    if ($return === false) 
    { 
     throw new Exception('Curl error: ' . curl_error($process)); 
    } 

    curl_close($process); 

    if (! preg_match('#Location: (.*)#', $return, $location)) 
    { 
     throw new Exception('No Location found'); 
    } 

    if (self::$max_redirects-- <= 0) 
    { 
     throw new Exception('Max redirections reached trying to get: ' . $url); 
    } 

    return trim($location[1]); 
    } 

} 
-3

您可以使用:

$redirectURL = curl_getinfo($ch,CURLINFO_REDIRECT_URL); 
14

添加此行卷曲inizialization

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 

和使用程序getinfo前curl_close

$redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL); 

ES :

$ch = curl_init($url); 
curl_setopt($ch, CURLOPT_HEADER, false); 
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0); 
curl_setopt($ch, CURLOPT_TIMEOUT, 60); 
$html = curl_exec($ch); 
$redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL); 
curl_close($ch); 
+2

我认为这是一个更好的解决方案,因为它也展示了多个重定向。 – 2015-04-12 20:24:33

+0

记住:(ok,duh)POST数据在重定向后不会被重新提交。 在我的情况下,发生了这种情况,之后我感到很蠢,因为:只是使用适当的URL并且它是固定的。 – twicejr 2017-05-22 17:57:25