使用simplehtmldom获取文本片段

我正在尝试使用simplehtmldom脚本获取一些文本。该HTML结构如下使用simplehtmldom获取文本片段

<div id="posts"> 
    <div align="center"> 
    <SEVERAL LEVELS OF HTML> 
     <strong>XXX</strong> 
    </SEVERAL LEVELS OF HTML> 
    </div> 
    <div align="center"> 
    <SEVERAL LEVELS OF HTML> 
     <strong>IGNORE</strong> 
    </SEVERAL LEVELS OF HTML> 
    </div> 
    <div align="center"> 
    <SEVERAL LEVELS OF HTML> 
     <strong>IGNORE</strong> 
    </SEVERAL LEVELS OF HTML> 
    </div> 
</div>

我想要知道的是XXX的字符串，在第一个<strong>标签第一<div>内具有属性align="center"，这是<div>与id="posts"内的文本。我对<div align="center">标签的文字不感兴趣。

的“HTML的几个层次”包括凌乱的嵌套表等

我的代码：我使用的后代选择，显然，我通过HTML的几个层次上“跳跃”。这就是为什么我的print_r显示"Trying to get property of non-object"？

$html = file_get_html($page_1); 
$es = $html->find('div#posts div[align=center] strong'); 
print_r($es->plaintext); die;

奇怪的是，该语句也返回相同的"Trying to get property of non-object"结果。我究竟做错了什么？

$es = $html->find('div#posts');

来源

2011-02-09 stef

2个可能的原因：

在$html = file_get_html($page_1);，$page_1可能不是一个URL。如果它是一个包含html的字符串，使用str_get_html而不是$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');。
该html包含多个div#posts（不应该）。

来源

2011-02-09 10:40:42 Shikiryu

使用simplehtmldom获取文本片段

回答

相关问题