2016-06-11 110 views
0

我试图从文件中找到所有不在HTML注释(<!-- -->)中的<meta>标签,并使用PHP函数 - get_meta_tags获取(提取)内容。但也有使用此功能时,两个问题:如何从文件中找到所有不在HTML注释(<!-- -->)中的所有<meta>标签?

  1. 虽然<meta>标签在注释,如:

    <!-- 
    <meta name="title" content="Title name"> 
    <mata name="keywords" content="keyword 1, keyword 2, keyword 3"> 
    <meta name="description" content="Hello world!"> 
    <meta name="author" content="Author name"> 
    <meta name="copyright" CONTENT="All rights reserved."> 
    <meta property="og:title" content="Title name" /> 
    <meta property="og:image" content="http://www.example.com/img/logo.gif" /> 
    <meta property="og:description" content="Hello world!" /> 
    --> 
    

    ,该get_meta_tags功能仍然提取所有<meta>标签,不管是在评论或不进入数组。但我需要的是提取HTML评论之外的<meta>标签。也就是说,我只想要这个页面中真正可用的<meta>标签。

  2. 如果<meta>标签没有名称,例如,有一些<meta>标签只拥有“产权”或“HTTP的当量”,如property="og:title"http-equiv="refresh",在get_meta_tags功能将无法提取这些<meta>标签进入阵列。

为了解决这两个问题,我该怎么办?谢谢。

回答

0

检查了这一点:

function get_meta_tags2($url) 
{ 
$result = false; 

$contents = file_get_contents(str_replace(array('<!--','-->'), '',$url)); 

if (isset($contents) && is_string($contents)) 
{ 
    $title = null; 
    $metaTags = null; 

    preg_match('/<title>([^>]*)<\/title>/si', $contents, $match); 

    if (isset($match) && is_array($match) && count($match) > 0) 
    { 
     $title = strip_tags($match[1]); 
    } 

    preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match); 

    if (isset($match) && is_array($match) && count($match) == 3) 
    { 
     $originals = $match[0]; 
     $names = $match[1]; 
     $values = $match[2]; 

     if (count($originals) == count($names) && count($names) == count($values)) 
     { 
      $metaTags = array(); 

      for ($i=0, $limiti=count($names); $i < $limiti; $i++) 
      { 
       $metaTags[$names[$i]] = array (
        'html' => htmlentities($originals[$i]), 
        'value' => $values[$i] 
       ); 
      } 
     } 
    } 

    $result = array (
     'title' => $title, 
     'metaTags' => $metaTags 
    ); 
} 

return $result; 
} 

输出将是:

<?php 
Array 
(
[title] => Teleit.pl - strony internetowe 
[metaTags] => Array 
    (
     [description] => Array 
      (
       [html] => <meta name="description" content="Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well." /> 
       [value] => Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well. 
      ) 

     [DC.title] => Array 
      (
       [html] => <meta name="DC.title" content="Mariano Iglesias - Weblog" /> 
       [value] => Mariano Iglesias - Weblog 
      ) 

     [ICBM] => Array 
      (
       [html] => <meta name="ICBM" content="-34.6017, -58.3956" /> 
       [value] => -34.6017, -58.3956 
      ) 

     [geo.position] => Array 
      (
       [html] => <meta name="geo.position" content="-34.6017;-58.3956" /> 
       [value] => -34.6017;-58.3956 
      ) 

     [geo.region] => Array 
      (
       [html] => <meta name="geo.region" content="AR-BA"> 
       [value] => AR-BA 
      ) 

     [geo.placename] => Array 
      (
       [html] => <meta name="geo.placename" content="Buenos Aires"> 
       [value] => Buenos Aires 
      ) 

    ) 

) 
?> 

学分原始版本:马里亚诺在cricava点com,我改变这一点给你。

相关问题