如何使用简单的HTML DOM解析器从页面中获取元素

我想使用简单的HTML DOM解析器解析HTML页面。这个HTML页面不使用ID，这使得更难引用元素。如何使用简单的HTML DOM解析器从页面中获取元素

在此页面上，我正在尝试获取专辑名称，歌曲标题，下载链接和专辑图片。我已经做到了这一点，但我甚至无法获得专辑名称！

$html = file_get_html('http://music.banadir24.com/singer/aasha_abdoo/247.html'); 

    $article = $html->find('table td[class=title]', 0); 

    foreach($article as $link){ 

     echo $link; 

    }

此输出：1tdArrayArrayArray Artist Array

我需要得到这种输出：

Image Path 
Duniya Jamiila [URL] 
Macaan Badnoo [URL] 
Donimaayee  [URL] 
...

感谢所有的帮助

请注意：这是合法的，因为歌曲是不受版权约束，他们可以免费下载，只需要下载他们中的很多，我不能整天坐在那里点击按钮。说了这么多，我花了一个小时才弄到这么远。

来源

2010-06-02 user356556

尝试print_r（$ link）;在你的循环内部学习更多关于数组的知识。 – 2010-06-02 15:21:18

如果您想要从一个页面下载多个文件，您可能需要查看Firefox的“DownThemAll！Plugin”。一个非常有用的工具，对于这样的问题，它需要零编程:) – 2ndkauboy 2010-06-02 15:28:09

@Kau - 我也使用，但我希望将文件放在一个很好的排序方式的目录。 – user356556 2010-06-02 16:34:13

这是你的意思？

$urls = $html->find('table[width=100%] table tr'); 
foreach($urls as $url){ 

    echo $url->children(2); 
    echo $url->children(6)->children(0)->href; 
    echo '<br>'; 
}

编辑

使用Simple HTML DOM。

以下是您的评论，这里有一些更新的代码（有希望）有帮助的评论。

$urls = $html->find('table[width=100%] table tr'); 
foreach($urls as $url){ 
    // Check that we actually have the right number of children, this was what was breaking before 
    if ($url->children(6)) { 
     /* Without the following check, we get a digg icon and a useless link. You can merge this with the if statement above, I only have it 
     * seperated so that I can write this comment and it will make more sense when reading it for the first time. 
     */ 
     if ($url->children(2)->children(0)->src == 'images/digg.png' || $url->children(2)->children(0)->href == 'javascript:void(0)') continue; 
     // echo out the name of the artist. You can get the text without the link by using $url->children(2)->plaintext 
     echo $url->children(2); 
     // echo out the link. Obviously you could put this href inside a <a href="code-here">whatever-here</a> tag to make the links clickable. 
     echo $url->children(6)->children(0)->href; 
     echo '<br>'; // just for readability 
    } 
}

来源

2010-06-02 15:39:46

这就是我的意思，那么简洁！但是我怎样才能把它带到下一张专辑？对我来说，它似乎只是停下来，抱怨'在一个非对象的成员函数children（）调用第一个专辑的最后一首歌曲名称后？ – user356556 2010-06-02 15:43:47

这是因为'$ url'中没有任何子节点（或者可能没有7个子节点），所以在进行调用之前您需要检查它是否真的有效。尝试去解决它（如果你将来会帮助别人，请张贴你的答案），否则如果我明天得到一些时间，我会多考虑一下。 – 2010-06-02 18:09:09

您的示例中使用的页面上只有三个TD标签，其类别属性值为“title”。

1. <td height="35" class="title" style="padding-left:7px;"> Artist</td> 
2. <td colspan="3" height="35" class="title" style="padding-left:7px;"><img src="images/b24/dot_next.png" />Desco</td> 
3. <td colspan="3" height="35" class="title" style="padding-left:7px;"><img src="images/b24/dot_next.png" />The Best Of Aasha</td>

第一个总是包含文本“艺术家”，其他人则包含相册标题。它们也是唯一具有class =“title”和colspan =“3”的TD标记，因此使用HTML DOM解析器选择它们应该很容易。

来源

2010-06-02 15:23:26 2ndkauboy

如何使用简单的HTML DOM解析器从页面中获取元素

回答

相关问题