循环通过与DOM文档元素的所有孩子，并提取文本的内容

这是一个XML文件（ODT文件）的结构，这是我尝试解析：循环通过与DOM文档元素的所有孩子，并提取文本的内容

<office:body> 
    <office:text> 
     <text:h text:style-name="P1" text:outline-level="2">Chapter 1</text:h> 
      <text:p text:style-name="Standard">Lorem ipsum. </text:p> 

      <text:h text:style-name="Heading3" text:outline-level="3">Subtitle 2</text:h> 
       <text:p text:style-name="Standard"><text:span text:style-name="T5">10</text:span><text:span text:style-name="T6">:</text:span><text:s/>Text (100%)</text:p> 
        <text:p text:style-name="Explanation">Further informations.</text:p> 
       <text:p text:style-name="Standard">9.7:<text:s/>Text (97%)</text:p> 
        <text:p text:style-name="Explanation">Further informations.</text:p> 
       <text:p text:style-name="Standard"><text:span text:style-name="T9">9.1:</text:span><text:s/>Text (91%)</text:p> 
        <text:p text:style-name="Explanation">Further informations.</text:p> 
        <text:p text:style-name="Explanation">More furter informations.</text:p> 
    </office:text> 
</office:body>

随着XML阅读器我做是这样说的：

while ($reader->read()){ 
    if ($reader->nodeType == XMLREADER::ELEMENT && $reader->name === 'text:h') { 
     if ($reader->getAttribute('text:outline-level')=="2") $html .= '<h2>'.$reader->expand()->textContent.'</h2>'; 
    } 
    elseif ($reader->nodeType == XMLREADER::ELEMENT && $reader->name === 'text:p') { 
     if ($reader->getAttribute('text:style-name')=="Standard") { 
      $html .= '<p>'.$reader->readInnerXML().'<p>'; 
     } 
     else if { 
      // Doing something different 
     } 
    } 
} 
echo $html;

现在我想这样做同样的事情与DOM文档，但我需要一些帮助的语法。我如何循环办公室的所有孩子：文字？当循环遍历所有节点时，我会通过if/else来检查要做什么（文本：h与text：p）。

我还需要更换所有的文本：S（如果在文本这样的元素：P）与空白...

$reader = new DOMDocument(); 
$reader->preserveWhiteSpace = false; 
$reader->load('zip://content.odt#content.xml'); 

$body = $reader->getElementsByTagName('office:text')->item(0); 
foreach($body->childNodes as $node) echo $node->nodeName . PHP_EOL;

还是会通过所有文本元素更加聪明，能循环？如果是这样的话，仍然是问题，如何做到这一点。

$elements = $reader->getElementsByTagName('text'); 
foreach($elements as $node){ 
    foreach($node->childNodes as $child) { 
     echo $child->nodeName.': '; 
     echo $child->nodeValue.'<br>'; 
     // check for type... 
    } 
}

来源

2014-11-02 user3142695

一个最简单的方法来做到这一点与DOM文档是与DOMXPath帮助。

考虑您的问题从字面上：

我如何遍历的办公室里所有的孩子：文字？

这可以表示为XPath expression：

//office:text/child::node()

但是你在这里使用了一个小错误的措辞。这不仅是所有的孩子，而且孩子的孩子，等等等等 - 这是所有后代：

//office:text/descendant::node()

或用缩写语法：

//office:text//node()

比较： XPath to Get All ChildNodes and not the Parent Node

对于循环遍历PHP，你需要注册为office前缀的名称空间，然后你遍历中的XPath与导致10： $ xpath = new DOMXPath（$ reader）; $ xpath-> registerNamespace（'office'，$ xml_namespace_uri_of_office_namespace）;

$descendants = $xpath->query('//office:text//node()'); 
foreach ($descendants as $node) { 
    // $node is a DOMNode as of DOMElement, DOMText, ... 
}

XPath不是一般的，但在PHP的基于libxml的库中确实以文档顺序返回节点。这是您要查找的订单。

比较：XPath query result order

来源

2014-11-02 10:23:35 hakre

循环通过与DOM文档元素的所有孩子，并提取文本的内容

回答

相关问题