2011-05-03 145 views
2

我想解析一下链接:http://dizli.com/dizli/db.html using php。用html解析html代码错误问题

但是,当我写的代码,

$url = "http://dizli.com/dizli/db.html"; 
$dom = new DOMDocument(); 
$html = $dom->loadHTMLFile($url); 
$dom->preserveWhiteSpace = false; 
$tables = $dom->getElementsByTagName('table'); 
$tr = $tables->item(2)->getElementsByTagName('tr'); 
$rows = $tables->item(0)->getElementsByTagName('td'); 

foreach($rows as $row) 
{ 
    $movie = $row->getElementsByTagName('b'); 
    echo $movie;} 

我得到了错误的一串:

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and td in http://dizli.com/dizli/db.html, line: 54 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and b in http://dizli.com/dizli/db.html, line: 81 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and b in http://dizli.com/dizli/db.html, line: 106 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: htmlParseEntityRef: no name in http://dizli.com/dizli/db.html, line: 115 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and b in http://dizli.com/dizli/db.html, line: 126 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and font in http://dizli.com/dizli/db.html, line: 126 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and b in http://dizli.com/dizli/db.html, line: 128 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: htmlParseEntityRef: no name in http://dizli.com/dizli/db.html, line: 1575 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Tag blink invalid in http://dizli.com/dizli/db.html, line: 2190 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and b in http://dizli.com/dizli/db.html, line: 2200 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and font in http://dizli.com/dizli/db.html, line: 2200 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: body and center in http://dizli.com/dizli/db.html, line: 2225 in C:\development\app_server\C7\Lib\Tools\News.php on line 93 

Catchable fatal error: Object of class DOMNodeList could not be converted to string in C:\development\app_server\C7\Lib\Tools\News.php on line 102 

人可以帮我分析这个环节,这样我就可以保存电影的名称和导演的名字。

在此先感谢。 Zeeshan

+0

有点关系 - http://stackoverflow.com/questions/1148928/disable-warnings-when- loading-non-well-formed-html-by-domdocument-php – Phil 2011-05-03 23:23:46

回答

1

该页面是用非常古老的HTML代码编写的(您可以通过FONT标记,大写字母等进行判断),因此<标签以及可能的段落和其他内容都未被封闭。我建议在这种情况下使用正则表达式来查找它们。

1

你的主要问题是最后一行:

echo $movie; 

$movieDOMNodeList一个实例,所以你不容只是呼应它,你需要得到it's元素例如像$movie->item(0)

你也可以做一个var_dump$movie,看看你能得到什么。

可能会忽略的警告,取决于您获得的输出。

2

要隐藏的错误,并与该代码,只是广告@$dom之前仍然有效,如:

$html = @$dom->loadHTMLFile($url); 
+0

为什么这有效?什么是@操作符?你能解释一下吗? – 2015-08-20 09:17:44