2016-01-20 91 views
0

我想将这个远程页面的HTML源代码加载到PHP中的字符串中,使用这个极棒的Galantis音乐视频https://www.youtube.com/watch?v=5XR7naZ_zZA作为示例。将HTML源代码加载到PHP中的字符串

然后我想在源代码中搜索特定的div id“action-panel-details”并确认它何时被发现。使用下面的代码,整个页面会简单地加载到我在服务器上运行的页面上。

这甚至有可能与file_get_contents()?这是所加载的网页,视频和所有代码:

<?php 

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA'); 

if(preg_match("~action-panel-details~", $str)){ 
echo "it's there"; 
} 

?> 

我一直在使用使用simplexml_load_file()也与此错误结束了尝试:

Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5 

Warning: simplexml_load_string(): ndow, document);</script><script>var ytcfg = {d: function() {return (window.yt & in /page.php on line 5 

Warning: simplexml_load_string():^in /page.php on line 5 

Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5 

这就是会产生的代码:

<?php 

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA'); 

$str = simplexml_load_string($str); 

if(preg_match("~watch-time-text~", $str)){ 
echo "it's there"; 
} 

?> 

任何帮助,非常感谢。

回答

0

是的,你是非常接近。基本上,只是废弃你试图加载到XML的部分,因为页面代码是HTML而不是XML。

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA'); 

if(preg_match("~watch-time-text~", $str)){ 
    print "Match was found!"; 
} 
else { 
    print "No match was found. :("; 
} 

这将显示:

Match was found! 

不幸的是,我不能告诉你一个演示,因为ideone.comcodepad.org都不允许我使用file_get_contents,但是从我自己的服务器工作原理。

如果遇到不允许我使用file_get_contents的情况,则可以像miglio所说的那样执行操作,并使用cURL来获取远程源。但其余的是一样的:

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, 'https://www.youtube.com/watch?v=5XR7naZ_zZA'); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$str = curl_exec($ch); 
curl_close($ch); 


if(preg_match("~watch-time-text~", $str)){ 
    print "Match was found!"; 
} 
else { 
    print "No match was found. :("; 
} 
+0

非常感谢。第一个解决方案是为我工作。 – bethbee

0

使用curl也许:

//$url = 'https://www.youtube.com/'; 
$url = "https://www.youtube.com/watch?v=5XR7naZ_zZA"; 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
$content = curl_exec($ch); 
curl_close($ch); 

if(preg_match("~watch-time-text~", $content)){ 
    echo "it's there"; 
}else{ 
    echo 'is another page'; 
} 

print document code: 
echo "<pre>".htmlentities($content)."<pre>"; 
// 
match whit html code in 'watch-time-text': 
<div id="action-panel-details" class="action-panel-content yt-uix-expander 
yt-uix-expander-collapsed yt-card yt-card-has-padding"> 
<div id="watch-description" class="yt-uix-button-panel"> 
<div id="watch-description-content"> 
<div id="watch-description-clip"><span id="watch-description-badges"></span> 
<div id="watch-uploader-info"><strong class="watch-time-text"> 
+0

谢谢你的回应。 – bethbee

相关问题