如何从文件中找到所有不在HTML注释（）中的所有<meta>标签？

我试图从文件中找到所有不在HTML注释（）中的<meta>标签，并使用PHP函数 - get_meta_tags获取（提取）内容。但也有使用此功能时，两个问题：如何从文件中找到所有不在HTML注释（）中的所有<meta>标签？

虽然<meta>标签在注释，如：

<!-- 
<meta name="title" content="Title name"> 
<mata name="keywords" content="keyword 1, keyword 2, keyword 3"> 
<meta name="description" content="Hello world!"> 
<meta name="author" content="Author name"> 
<meta name="copyright" CONTENT="All rights reserved."> 
<meta property="og:title" content="Title name" /> 
<meta property="og:image" content="http://www.example.com/img/logo.gif" /> 
<meta property="og:description" content="Hello world!" /> 
-->

，该get_meta_tags功能仍然提取所有<meta>标签，不管是在评论或不进入数组。但我需要的是提取HTML评论之外的<meta>标签。也就是说，我只想要这个页面中真正可用的<meta>标签。

如果<meta>标签没有名称，例如，有一些<meta>标签只拥有“产权”或“HTTP的当量”，如property="og:title"，http-equiv="refresh"，在get_meta_tags功能将无法提取这些<meta>标签进入阵列。

为了解决这两个问题，我该怎么办？谢谢。

来源

2016-06-11 Banana Code

检查了这一点：

function get_meta_tags2($url) 
{ 
$result = false; 

$contents = file_get_contents(str_replace(array('<!--','-->'), '',$url)); 

if (isset($contents) && is_string($contents)) 
{ 
    $title = null; 
    $metaTags = null; 

    preg_match('/<title>([^>]*)<\/title>/si', $contents, $match); 

    if (isset($match) && is_array($match) && count($match) > 0) 
    { 
     $title = strip_tags($match[1]); 
    } 

    preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match); 

    if (isset($match) && is_array($match) && count($match) == 3) 
    { 
     $originals = $match[0]; 
     $names = $match[1]; 
     $values = $match[2]; 

     if (count($originals) == count($names) && count($names) == count($values)) 
     { 
      $metaTags = array(); 

      for ($i=0, $limiti=count($names); $i < $limiti; $i++) 
      { 
       $metaTags[$names[$i]] = array (
        'html' => htmlentities($originals[$i]), 
        'value' => $values[$i] 
       ); 
      } 
     } 
    } 

    $result = array (
     'title' => $title, 
     'metaTags' => $metaTags 
    ); 
} 

return $result; 
}

输出将是：

<?php 
Array 
(
[title] => Teleit.pl - strony internetowe 
[metaTags] => Array 
    (
     [description] => Array 
      (
       [html] => <meta name="description" content="Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well." /> 
       [value] => Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well. 
      ) 

     [DC.title] => Array 
      (
       [html] => <meta name="DC.title" content="Mariano Iglesias - Weblog" /> 
       [value] => Mariano Iglesias - Weblog 
      ) 

     [ICBM] => Array 
      (
       [html] => <meta name="ICBM" content="-34.6017, -58.3956" /> 
       [value] => -34.6017, -58.3956 
      ) 

     [geo.position] => Array 
      (
       [html] => <meta name="geo.position" content="-34.6017;-58.3956" /> 
       [value] => -34.6017;-58.3956 
      ) 

     [geo.region] => Array 
      (
       [html] => <meta name="geo.region" content="AR-BA"> 
       [value] => AR-BA 
      ) 

     [geo.placename] => Array 
      (
       [html] => <meta name="geo.placename" content="Buenos Aires"> 
       [value] => Buenos Aires 
      ) 

    ) 

) 
?>

学分原始版本：马里亚诺在cricava点com，我改变这一点给你。

来源

2016-06-11 17:44:46 PawelN

如何从文件中找到所有不在HTML注释（）中的所有<meta>标签？

回答

相关问题