我试图刮掉以下页面:http://mangafox.me/manga/用简单的HTML DOM解析器刮,但它突然停止
我想脚本点击每这些链接和刮每个漫画的细节和为最部分我的代码完全是这样。它可以工作,但由于某种原因,页面中途停止加载(它甚至没有通过#列表)。
没有错误信息,所以我不知道我在找什么。我会很感激我做错了一些建议。
代码:
<?php
include('simple_html_dom.php');
set_time_limit(0);
//ini_set('max_execution_time', 300);
//Creates an instance of the simple_html_dom class
$html = new simple_html_dom();
//Loads the page from the URL entered
$html->load_file('http://mangafox.me/manga');
//Finds an element and if there is more than 1 instance the variable becomes an array
$manga_urls = $html->find('.manga_list a');
//Function which retrieves information needed to populate the DB from indiviual manga pages.
function getmanga($value, $url){
$pagehtml = new simple_html_dom();
$pagehtml->load_file($url);
if ($value == 'desc') {
$description = $pagehtml->find('p.summary');
foreach($description as $d){
//return $d->plaintext;
return $desc = $d->plaintext;
}
unset($description);
} else if ($value == 'status') {
$status = $pagehtml->find('div[class=data] span');
foreach ($status as $s) {
$status = explode(",", $s->plaintext);
return $status[0];
}
unset($status);
} else if ($value == 'genre') {
$genre = $pagehtml->find('//*[@id="title"]/table/tbody/tr[2]/td[4]');
foreach ($genre as $g) {
return $g->plaintext;
}
unset($genre);
} else if ($value == 'author') {
$author = $pagehtml->find('//*[@id="title"]/table/tbody/tr[2]/td[2]');
foreach ($author as $a) {
return $a->plaintext;
}
unset($author);
} else if ($value == 'release') {
$release = $pagehtml->find('//*[@id="title"]/table/tbody/tr[2]/td[1]');
foreach ($release as $r) {
return $r->plaintext;
}
unset($release);
} else if ($value == 'image') {
$image = $pagehtml->find('.cover img');
foreach ($image as $i) {
return $i->src;
}
unset($image);
}
$pagehtml->clear();
unset($pagehtml);
}
foreach($manga_urls as $url) {
$href = $url->href;
if (strpos($href, 'http') !== false){
echo 'Title: ' . $url->plaintext . '<br />';
echo 'Link: ' . $href . '<br />';
echo 'Description: ' . getmanga('desc', $href) . '<br />';
echo 'Status: ' . getmanga('status',$href) . '<br />';
echo 'Genre: ' . getmanga('genre', $href) . '<br />';
echo 'Author: ' . getmanga('author', $href) . '<br />';
echo 'Release: ' . getmanga('release', $href) . '<br />';
echo 'Image Link: ' . getmanga('image', $href) . '<br />';
echo '<br /><br />';
}
}
$html->clear();
unset($html);
?>
工作结果显示我们的错误报告,请。 – 2014-10-30 16:11:27
你的意思是error_log文件?它没有任何东西(除了之前我从一个白痴得到的错误Eg [30-Oct-2014 10:45:15 America/Chicago] PHP致命错误:最大执行时间超过30秒在/home1/hashmkb/public_html/manga/simple_html_dom.php在线1622 因此,我把set_time_limit(0);在代码 – hash004 2014-10-30 16:21:20
因为我解决了它通过添加set_time_limit(0);除非我做错了, :S – hash004 2014-10-30 16:41:17