2016-07-15 120 views
1

我试图抓取一个实际上阻止机器人的网站。如何使用PHP Gouttee发送自定义头文件

我在PHP cURL中使用这段代码来消除堵塞。

$headers = array(
    'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 
    'Accept-Encoding: zip, deflate, sdch' 
    , 'Accept-Language:en-US,en;q=0.8' 
    , 'Cache-Control:max-age=0', 
    'User-Agent:' . $user_agents[array_rand($user_agents)] 
); 
curl_setopt($curl_init, CURLOPT_URL, $url); 
curl_setopt($curl_init, CURLOPT_HTTPHEADER, $headers); 
$output = curl_exec($curl_init); 

它运作良好。

但是我使用PHP Goutte,我想使用这个库

$headers2 = array(
    'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 
    'Accept-Encoding' => 'zip, deflate, sdch' 
    , 'Accept-Language' => 'en-US,en;q=0.8' 
    , 'Cache-Control' => 'max-age=0', 
    'User-Agent' => $user_agents[array_rand($user_agents)] 
); 
$client = new Client(); 

foreach ($headers2 as $key => $v) { 
    $client->setHeader($key, $v); 
} 
$resp = $client->request('GET', $url); 
echo $resp->html(); 

但是使用这个代码,我从我刮的部位受阻,产生同样的要求。

我想知道如何使用Gouttee正确使用标题?

回答

2

你可以尝试检查GOUTTE

$status_code = $client->getResponse()->getStatus(); 
echo $status_code; 

的结果这是源代码,我有成功与狂饮 在index.php文件

<?php 
    ini_set('display_errors', 1); 
?> 
<html> 
<head><meta charset="utf-8" /></head> 
<?php 
    $begin = microtime(true); 
    require 'vendor/autoload.php'; 
    require 'helpers/helper.php'; 
    $client = new GuzzleHttp\Client([ 
     'base_uri' => 'http://www.yellowpages.com.au', 
     'cookies' => true, 
     'headers' => [ 
      'Accept'   => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 
      'Accept-Encoding' => 'zip, deflate, sdch', 
      'Accept-Language' => 'en-US,en;q=0.8', 
      'Cache-Control' => 'max-age=0', 
      'User-Agent'  => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0' 
     ] 
    ]); 
    $helper = new Helper($client); 
    $mostViewed = $helper->getPageTest(); 
?> 
<html> 

在helper.php文件

<?php 
use GuzzleHttp\ClientInterface; 
use Symfony\Component\DomCrawler\Crawler; 
class Helper{ 
    protected $client; 
    protected $totalPages; 
    public function __construct(ClientInterface $client){ 
     $this->client  = $client; 
     $this->totalPages = 3; 
    } 
    public function query() 
    { 
     $queries = array(
      'clue' => 'Builders', 
      'locationClue' => 'Sydney%2C+2000', 
      'mappable' => 'true', 
      'selectedViewMode' => 'list' 
     ); 
     // print_r($queries); 
     return $this->client->get('search/listings', array('query' => $queries)); 
    } 
    public function getPageTest() 
    { 
     $responses = $this->query(); 
     $html = $responses->getBody()->getContents(); 
     echo $html; 
     exit(); 
    } 
} 
?> 

和我得到的结果

enter image description here

希望这有帮助!

+0

http://www.yellowpages.com.au/search/listings?clue=Builders&locationClue=Sydney%2C+2000&mappable=true&selectedViewMode=list这是网址 – Umair

+0

嗨兄弟,您可以在这种情况下使用Guzzle替换Goutte吗?我已经成功与Guzzle –

+0

我已经更新我的源代码与Guzzle很好地工作。它与Goutte类似。 –