2010-12-13 307 views
4
$html = file_get_contents("test.html"); 
$doc = new DOMDocument(); 
$doc->loadHTML($html); 
$xpath = new DOMXPath($doc); 
$body = $xpath->query('//body'); 

我想遍历HTML文件的body标记的所有元素,并打印出与这些元素关联的“style”属性。我怎样才能做到这一点?使用DOM循环遍历“body”标记的所有元素

+0

所有元素是身体元素的直接子元素还是身体下面元素的整个树? – Gordon 2010-12-13 17:10:13

+0

我的意思是身体下面的元素的整个树:) – Teiv 2010-12-13 17:18:32

回答

9

你可以把我的RecursiveDOMIterator此:

代码(压缩)

class RecursiveDOMIterator implements RecursiveIterator 
{ 
    protected $_position; 
    protected $_nodeList; 
    public function __construct(DOMNode $domNode) 
    { 
     $this->_position = 0; 
     $this->_nodeList = $domNode->childNodes; 
    } 
    public function getChildren() { return new self($this->current()); } 
    public function key()   { return $this->_position; } 
    public function next()  { $this->_position++; } 
    public function rewind()  { $this->_position = 0; } 
    public function valid() 
    { 
     return $this->_position < $this->_nodeList->length; 
    } 
    public function hasChildren() 
    { 
     return $this->current()->hasChildNodes(); 
    } 
    public function current() 
    { 
     return $this->_nodeList->item($this->_position); 
    } 
} 

用法:

$dom = new DOMDocument; 
$dom->loadHTMLFile('http://stackoverflow.com/questions/4431142/'); 

$dit = new RecursiveIteratorIterator(
    new RecursiveDOMIterator($dom), 
    RecursiveIteratorIterator::SELF_FIRST 
); 

foreach($dit as $node) { 
    if($node->nodeType === XML_ELEMENT_NODE && $node->hasAttribute('style')) { 
     printf(
      'Element %s - Styles: %s%s', 
      $node->nodeName, 
      $node->getAttribute('style'), 
      PHP_EOL 
     ); 
    } 
} 

OUTP UT:

Element div - Styles: margin-top: 8px; height:24px; 
Element div - Styles: margin-top: 8px; height:24px; display:none; 
Element a - Styles: font-size: 200%; margin-left: 30px; 
Element div - Styles: display:none 
Element div - Styles: display:none 
Element span - Styles: color:#FE7A15;font-size:140% 
Element span - Styles: color:#FE7A15;font-size:140% 
Element span - Styles: color:#FE7A15;font-size:140% 
Element span - Styles: color:#E8272C;font-size:140% 
Element span - Styles: color:#00AFEF;font-size:140% 
Element span - Styles: color:#969696;font-size:140% 
Element span - Styles: color:#46937D;font-size:140% 
Element span - Styles: color:#C0D0DC;font-size:140% 
Element span - Styles: color:#000;font-size:140% 
Element span - Styles: color:#dd4814;font-size:140% 
Element span - Styles: color:#9ce4fe;font-size:140% 
Element span - Styles: color:#cf4d3f;font-size:140% 
Element span - Styles: color:#f4f28d;font-size:140% 
Element span - Styles: color:#0f3559;font-size:140% 
Element span - Styles: color:#f2f2f2;font-size:140% 
Element span - Styles: color:#037187;font-size:140% 
Element span - Styles: color:#f1e7cc;font-size:140% 
Element span - Styles: color:#e1cdae;font-size:140% 
Element span - Styles: color:#a2d9f6;font-size:140% 
+1

非常感谢您的答案。这很好,很棒,完全按照我期望的结果来:) – Teiv 2010-12-13 17:29:15

0

我这样递归地做了。我不确定它是否是最有效的方法。我尝试了这个网页上的方法,它工作正常。

$dom = new DOMDocument(); 
$dom->loadHTML($html); 

$xpath = new DOMXPath($dom); 
$body = $xpath->query('//body')->item(0); 

recursePrintStyles($body); 

function recursePrintStyles($node) 
{ 
    if ($node->nodeType !== XML_ELEMENT_NODE) 
    { 
     return; 
    } 

    echo $node->tagName; 
    echo "\t"; 
    echo $node->getAttribute('style'); 
    echo "\n"; 

    foreach ($node->childNodes as $childNode) 
    { 
     recursePrintStyles($childNode); 
    } 
} 
8

另一种选择是使用XPath只找到元素从<body>下降和具有style属性,如:

$dom = new DOMDocument; 
$dom->loadHTMLFile('https://stackoverflow.com/questions/4431142/'); 

$xpath = new DOMXPath($dom); 
$nodes = $xpath->query('/html/body//*[@style]'); 

foreach($nodes as $node) { 
    printf(
     'Element %s - Styles: %s%s', 
     $node->nodeName, 
     $node->getAttribute('style'), 
     PHP_EOL 
    ); 
} 

的输出是相同Gordon's answer和唯一重要的一条是$nodes = …之一。