2017-02-22 162 views
1

Final Update It appears that the targeted website blocked DO IPs and are giving the problems which I've been resolving for days. I spinned a EC2 instance and manage to work the code working, together with caching etc so as to reduce the hit on the website and allow my user to share the website.PHP卷曲405不允许

-

UPDATE: I manage to get the Html by setting curl error to off, however the website other than returning 405 error is also not setting some cookies which are required for the website content to be loaded.

curl_setopt($ CH,CURLOPT_FAILONERROR,FALSE);

我使用下面的代码为ajax-> PHP来检索og:元网站。但是,有1或2个特定网站返回错误,并不会检索信息。有以下错误。该代码可以为大多数网站无缝工作。

Warning: DOMDocument::loadHTML(): Empty string supplied as input in /my/home/path/getUrlMeta.php on line 58

从curl_error在我的error_log

The requested URL returned error: 405 Not Allowed

而且

Failed to connect to www.something.com port 443: Connection refused

我没有问题得到当我用卷曲我的服务器控制台上的网站的HTML和没有问题的检索大部分使用以下代码的网站所需信息

function file_get_contents_curl($url) 
{ 
    $ch = curl_init(); 
    $header[0] = "Accept: text/html, text/xml,application/xml,application/xhtml+xml,"; 
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; 
    $header[] = "Cache-Control: max-age=0"; 
    $header[] = "Connection: keep-alive"; 
    $header[] = "Keep-Alive: 300"; 
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; 
    $header[] = "Accept-Language: en-us,en;q=0.5"; 
    $header[] = "Pragma: no-cache"; 
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header); 

    curl_setopt($ch, CURLOPT_HEADER, 0); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch, CURLOPT_URL, $url); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
    //curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET'); 

    curl_setopt($ch, CURLOPT_FAILONERROR, true); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
    curl_setopt($ch, CURLOPT_TIMEOUT, 30); 
    curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 "); 
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false); 
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
    //The following 2 set up lines work with sites like www.nytimes.com 

    //Update: Added option for cookie jar since some websites recommended it. cookies.txt is set to permission 777. Still doesn't work. 
    $cookiefile = '/home/my/folder/cookies.txt'; 
    curl_setopt($ch, CURLOPT_COOKIESESSION, true); 
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiefile); 
    curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiefile); 

    $data = curl_exec($ch); 

    if(curl_error($ch)) 
    { 
     error_log(curl_error($ch)); 
    } 
    curl_close($ch); 

    return $data; 
} 

$html = file_get_contents_curl($url); 

libxml_use_internal_errors(true); // Yeah if you are so worried about using @ with warnings 
$doc = new DomDocument(); 
$doc->loadHTML($html); 
$xpath = new DOMXPath($doc); 
$query = '//*/meta[starts-with(@property, \'og:\')]'; 
$metas = $xpath->query($query); 
$rmetas = array(); 
foreach ($metas as $meta) { 
    $property = substr($meta->getAttribute('property'),3); 
    $content = $meta->getAttribute('content'); 
    $rmetas[$property] = $content; 
} 

/*below code retrieves the next bigger than 600px image should og:image be empty.*/ 
if (empty($rmetas['image'])) { 
    //$src = $xpath->evaluate("string(//img/@src)"); 
    //echo "src=" . $src . "\n"; 
    $query = '//*/img'; 
    $srcs = $xpath->query($query); 
    foreach ($srcs as $src) { 

     $property = $src->getAttribute('src'); 


     if (substr($property,0,4) == 'http' && in_array(substr($property,-3), array('jpg','png','peg'), true)) { 
      if (list($width, $height) = getimagesize($property)) { 
      do if ($width > 600) { 
       $rmetas['image'] = $property; 
       break; 
      } while (0); 
      } 
     } 

    } 
} 

echo json_encode($rmetas); 


die(); 

UPDATE: Error on my part that website is not https enabled so I still have the 405 not allowed error.

卷曲信息

{ 
    "url": "http://www.example.com/", 
    "content_type": null, 
    "http_code": 405, 
    "header_size": 0, 
    "request_size": 458, 
    "filetime": -1, 
    "ssl_verify_result": 0, 
    "redirect_count": 0, 
    "total_time": 0.326782, 
    "namelookup_time": 0.004364, 
    "connect_time": 0.007725, 
    "pretransfer_time": 0.007867, 
    "size_upload": 0, 
    "size_download": 0, 
    "speed_download": 0, 
    "speed_upload": 0, 
    "download_content_length": -1, 
    "upload_content_length": -1, 
    "starttransfer_time": 0.326634, 
    "redirect_time": 0, 
    "redirect_url": "", 
    "primary_ip": "SOME IP", 
    "certinfo": [], 
    "primary_port": 80, 
    "local_ip": "SOME IP", 
    "local_port": 52966 
} 

Update: If I do a curl -i from console I get the following response. A error 405 but it follows by all the HTML that I need.

Home> curl -i http://www.domain.com 
HTTP/1.1 405 Not Allowed 
Server: nginx 
Date: Wed, 22 Feb 2017 17:57:03 GMT 
Content-Type: text/html; charset=UTF-8 
Transfer-Encoding: chunked 
Vary: Accept-Encoding 
Vary: Accept-Encoding 
Set-Cookie: PHPSESSID2=ko67tfga36gpvrkk0rtqga4g94; path=/; domain=.domain.com 
Expires: Thu, 19 Nov 1981 08:52:00 GMT 
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 
Pragma: no-cache 
Set-Cookie: __PAGE_REFERRER=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=www.domain.com 
Set-Cookie: __PAGE_SITE_REFERRER=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=www.domain.com 
X-Repository: legacy 
X-App-Server: production-web23:8018 
X-App-Server: distil2-kvm:80 
+0

如果它只在某些站点停止工作,这是服务器端问题。我们无能为力。 – miken32

+0

@ miken32但可从网络浏览器访问URL。不卷曲模拟浏览器?这是一个公开访问的网站,不需要登录,不需要SSL等。 –

+0

删除'CURLOPT_FAILONERROR',你将得到405的全部内容,就像你展示的命令行一样。 –

回答

0

以下内容添加到您的代码,以帮助调试问题:

$info = curl_getinfo($ch); 
print_r($info); 

更可能的,问题如下:

  • 405不允许 - 您试图使cURL调用不被允许。例如进行GET调用时,只允许POST。
  • 443:拒绝连接 - 您尝试访问的网站不支持HTTPS。或者,该网站正在使用您的代码不支持的加密协议,例如只使用TLSv1.2,而你的代码可能使用TLSv1.1。
+0

我在我的问题中添加了curl_getinfo。该网站是一个可公开访问的网站,当用户在我的应用程序中共享网站网址时,我试图获得og标签(认为Facebook网址共享)。 –

+0

变成网站不使用HTTPS,所以我不需要修复连接拒绝错误,但我仍然无法获得405错误解决。 –

+0

您是否尝试过使用浏览器访问URL框405?这个URL是否允许GET请求? –