2014-05-17 40 views
0

我是DOMXPath的新手,但我想了解更多信息。目前,我有一个HTML的结构是这样的:如何从DOMXPath查询获取特定值?

<span class="1"> 
     <div class="headerClass"> 
      Here you have <span class="spanClass1">some text</span>. And here there is <span class="spanClass2">even more text</span> 
     </div> 
     <table class="tableClass" id="tableID"> 
      <tr> 
       <td>some text</td> 
       <td>some text</td> 
       <td>some text</td> 
      </tr> 
      <tr> 
       <td>some text</td> 
       <td>some text</td> 
       <td><a href="http://www.website1.com" target="_blank">My Link</a></td> 
      </tr> 
      <tr> 
       <td>some text</td> 
       <td>some text</td> 
       <td><a href="http://www.website2.com" target="_blank">My Link</a></td> 
      </tr> 
     </table> 
    </span> 

    <span class="2"> 
     <div class="headerClass"> 
      Here you have <span class="spanClass1">some text</span>. And here there is <span class="spanClass2">even more text</span> 
     </div> 
     <table class="tableClass" id="tableID"> 
      <tr> 
       <td>some text</td> 
       <td>some text</td> 
       <td>some text</td> 
      </tr> 
      <tr> 
       <td>some text</td> 
       <td>some text</td> 
       <td><a href="http://www.website1.com" target="_blank">My Link</a></td> 
      </tr> 
      <tr> 
       <td>some text</td> 
       <td>some text</td> 
       <td><a href="http://www.website2.com" target="_blank">My Link</a></td> 
      </tr> 
     </table> 
    </span> 

... and the spans continue: 3, 4, 5 ... etc 

为了取回源文件HTML代码,我用这:

$oDomXpath = new DOMXpath($oDom); 
$query = "//span[number(@class)=number(@class)]"; 
$oDomObject = $oDomXpath->query($query); 

foreach ($oDomObject as $oObject) { 
    // WHAT GOES HERE???? 
} 

我需要一个数组来存储以下值:

  1. 所有<div class="headerClass">的纯文本没有HTML标记。
  2. 全部文字<span class="spanClass2">
  3. 所有网址都在表格内。表格可以具有从0到多个的任意数量的行。

我怎样才能做到这一点?我需要将哪些内容放入foreach循环中?我是否需要运行另一个查询?

非常感谢您的帮助!

回答

2

你有选择,你可以使用几个XPath查询,并通过一个获取值,或者你可以用多种途径建立一个独特的XPath查询:

<pre><?php 
$dom = new DOMDocument(); 
@$dom->loadHTMLFile('yourfile.html'); 

$xpath = new DOMXPath($dom); 

$xquery = <<<'EOD' 
//span[number(@class)[email protected]]/@class | 
//span[number(@class)[email protected]]/div[@class='headerClass'] | 
//span[number(@class)[email protected]]/div[@class='headerClass']/span[@class='spanClass2'] | 
//span[number(@class)[email protected]]/table[@class='tableClass']/tr/td/a 
EOD; 

$nodes = $xpath->query($xquery); 

foreach ($nodes as $node) { 
    if ($node->nodeType == XML_ELEMENT_NODE) 
     switch($node->nodeName): 
      case 'div' : echo '<br/>div content: ' . $node->nodeValue; break; 
      case 'span': echo '<br/>span content: ' . $node->nodeValue; break; 
      default : echo '<br/>url: ' . $node->getAttribute('href'); 
     endswitch; 
    else 
     echo '<br/><br/>number: ' . $node->value; 
} 
+0

非常感谢您!它真的引导我到解决方案!干杯! – karlosuccess