将html转换为url scraper

因此，一个非常有帮助的人帮助我在Stackoverflow上获得了这么多，但是我需要将他的代码从HTMl转换为一个URL来刮擦我尝试了一遍又一遍，并且一直打错了任何想法？将html转换为url scraper

function getElementByIdAsString($html, $id, $pretty = true) { 
$doc = new DOMDocument(); 
@$doc->loadHTML($html); 

if(!$doc) { 
    throw new Exception("Failed to load $url"); 
} 
$element = $doc->getElementById($id); 
if(!$element) { 
    throw new Exception("An element with id $id was not found"); 
} 

// get all object tags 
$objects = $element->getElementsByTagName('object'); // return node list 

// take the the value of the data attribute from the first object tag 
$data = $objects->item(0)->getAttributeNode('data')->value; 

// cut away the unnecessary parts and return the info 
return substr($data, strpos($data, '=')+1); 

} 

// call it: 
$finalcontent = getElementByIdAsString($html, 'mainclass'); 

print_r ($finalcontent);

来源

2015-11-19 Jamie

你提到的错误......它们是什么？ – camelCase

它只是空白。有没有更好的方法让我得到错误？所有这一切都是新的 – Jamie

我简单地试图放置一个URL来抓取，而不是堆栈溢出的人做的$ html示例 – Jamie

记住，试图捕捉当您使用的功能，因为它很可能会抛出Exception S的将导致500服务器错误。

$finalcontent = getElementByIdAsString($html, 'mainclass');

应该成为

try { 
    $finalcontent = getElementByIdAsString($html, 'mainclass'); 
}catch(Exception $e){ 
    echo $e->getMessage(); 
}

来源

2015-11-19 17:26:38 Elijah

非常感谢您删除了错误！现在是主要问题。我需要从URL中抓取这些数据，我怎样才能将这段代码转换成读取一个URL，而不是使用目前正在做的$ html。 – Jamie

根据您拥有的托管方式，您应该能够调用'$ html = file_get_contents（$ url）;'这将采用您提供的URL并尝试获取该文档的HTML，如果这不起作用，您将可能不得不查看cURL，并且可以通过这种方式获取页面的HTML！ – Elijah

我假设它现在白色屏蔽这将不适用于自定义linode上的wordpress？ – Jamie

将html转换为url scraper

回答

相关问题