将HTML源代码加载到PHP中的字符串

我想将这个远程页面的HTML源代码加载到PHP中的字符串中，使用这个极棒的Galantis音乐视频https://www.youtube.com/watch?v=5XR7naZ_zZA作为示例。将HTML源代码加载到PHP中的字符串

然后我想在源代码中搜索特定的div id“action-panel-details”并确认它何时被发现。使用下面的代码，整个页面会简单地加载到我在服务器上运行的页面上。

这甚至有可能与file_get_contents（）？这是所加载的网页，视频和所有代码：

<?php 

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA'); 

if(preg_match("~action-panel-details~", $str)){ 
echo "it's there"; 
} 

?>

我一直在使用使用simplexml_load_file（）也与此错误结束了尝试：

Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5 

Warning: simplexml_load_string(): ndow, document);</script><script>var ytcfg = {d: function() {return (window.yt & in /page.php on line 5 

Warning: simplexml_load_string():^in /page.php on line 5 

Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5

这就是会产生的代码：

<?php 

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA'); 

$str = simplexml_load_string($str); 

if(preg_match("~watch-time-text~", $str)){ 
echo "it's there"; 
} 

?>

任何帮助，非常感谢。

来源

2016-01-20 bethbee

是的，你是非常接近。基本上，只是废弃你试图加载到XML的部分，因为页面代码是HTML而不是XML。

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA'); 

if(preg_match("~watch-time-text~", $str)){ 
    print "Match was found!"; 
} 
else { 
    print "No match was found. :("; 
}

这将显示：

Match was found!

不幸的是，我不能告诉你一个演示，因为ideone.com和codepad.org都不允许我使用file_get_contents，但是从我自己的服务器工作原理。

如果遇到不允许我使用file_get_contents的情况，则可以像miglio所说的那样执行操作，并使用cURL来获取远程源。但其余的是一样的：

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, 'https://www.youtube.com/watch?v=5XR7naZ_zZA'); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$str = curl_exec($ch); 
curl_close($ch); 


if(preg_match("~watch-time-text~", $str)){ 
    print "Match was found!"; 
} 
else { 
    print "No match was found. :("; 
}

来源

2016-01-20 22:25:14 Quixrick

非常感谢。第一个解决方案是为我工作。 – bethbee

使用curl也许：

//$url = 'https://www.youtube.com/'; 
$url = "https://www.youtube.com/watch?v=5XR7naZ_zZA"; 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
$content = curl_exec($ch); 
curl_close($ch); 

if(preg_match("~watch-time-text~", $content)){ 
    echo "it's there"; 
}else{ 
    echo 'is another page'; 
} 

print document code: 
echo "<pre>".htmlentities($content)."<pre>"; 
// 
match whit html code in 'watch-time-text': 
<div id="action-panel-details" class="action-panel-content yt-uix-expander 
yt-uix-expander-collapsed yt-card yt-card-has-padding"> 
<div id="watch-description" class="yt-uix-button-panel"> 
<div id="watch-description-content"> 
<div id="watch-description-clip"><span id="watch-description-badges"></span> 
<div id="watch-uploader-info"><strong class="watch-time-text">

来源

2016-01-20 21:41:17 miglio

谢谢你的回应。 – bethbee

将HTML源代码加载到PHP中的字符串

回答

相关问题