2014-10-01 104 views
1

我正在寻找提取完整的div,我已经能够从源代码的其余部分中提取。从那个div中,我想要所有的html内容,但没有内部的一些子div。 HTML代码查询:如何获取div内的特定元素?

<div class="content"> 
    <div class="article-title"> 
     <h2>Title of the test</h2> 
     <a href="http://www.helloworld.com" title="post by world" rel="author" class="article-icon"><span class="text-icon">&#x1F464;</span>world</a> 
     <span class="article-icon"> 
      <span class="text-icon">&#x1F4C1;</span> 
       <a href="http://www.helloworld.com/world">world</a>, 
      </span> 
      <span class="article-icon"><span class="text-icon">&#x1F554;</span>20.August 2014 
     </span> 
    </div> 
    <p class="p1"> 
     <span class="s1"><b>a test</b></span> 
    </p> 
    <p class="p2"> 
     <span class="s1">text2</span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b><a href="http://www.helloworld.com/hello.jpg"> 
      <img class="alignright size-medium wp-image-19472" src="http://www.helloworld.com/hello.jpg" alt="hello" width="300" height="218"></a>Hello</b> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b>text text text</b></span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b><a href="http://www.helloworld.com/hello2.jpg"> 
      <img class="alignleft size-medium wp-image-19474" src="http://www.helloworld.com/hello2.jpg" alt="hello2" width="300" height="200"></a>Hello2</b> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1">text1</span> 
    </p> 
    <p class="p1"> 
     <span class="s1">text2</span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b>Final thoughts</b></span> 
    </p> 
    <p class="p1"> 
     <span class="s1">testing (<a href="http://www.helloworld.com/test"> 
      <span class="s2">test</span></a>, 
      <a href="http://www.helloworld.com/test2"> 
      <span class="s2">test2</span></a> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1">***</span> 
    </p> 
    <p class="p5"><em> 
     <span class="s1">xyz <a href="http://www.helloworld.com/xyz"> 
      <span class="s2">123</span></a> (at <a href="http://www.helloworld.com"> 
      <span class="s2">http://www.helloworld.com</span></a>. &#xA0; 
     </span></em> 
    </p> 
    <div class="panel-breaking-line"></div> 
    <div class="article-tags"> <b>Tags added to this article</b> 
     <div class="tagcloud"> <a href="http://www.helloworld.com/world">world</a><a href="http://www.helloworld.com/xyz">zyx</a> </div> 
    </div> 
    <div class="panel-breaking-line"></div> 
    <div class="article-socials"> <b>Share this article with friends</b> 
     <div class="social-likes"> 
      <div class="soc-button soc-button-facebook"> <a href="http://www.facebook.com/sharer/sharer.php?u=http://www.helloworld.com/world" data-url="http://www.helloworld.com/world" class="soc-click ot-share"> 
       <span class="text-icon">&#xF30C;</span>FACEBOOK</a> 
       <span class="likes-count"> 
        <span class="count">0</span> 
        <span class="bullet">&#xA0;</span> 
       </span> 
       </div> 
       <div class="soc-button soc-button-twitter"> <a href="#" class="soc-click ot-tweet" data-hashtags="" data-url="http://www.helloworld.com/world" data-via="" data-text="World"> 
        <span class="text-icon">&#xF309;</span>TWITTER</a> 
        <span class="likes-count"> 
         <span class="count">0</span> 
         <span class="bullet">&#xA0;</span> 
        </span> 
       </div> 
       <div class="soc-button soc-button-pinterest"> <a href="http://pinterest.com/pin/create/button/?url=http://www.helloworld.com/world" data-url="http://www.helloworld.com/world" class="ot-pin soc-click"> 
       <span class="text-icon">&#xF312;</span>PINTEREST</a> 
       <span class="likes-count"> 
        <span class="count">0</span> 
        <span class="bullet">&#xA0;</span> 
       </span> 
      </div> 
      <div class="soc-button soc-button-google"> <a href="https://plus.google.com/share?url=http://www.helloworld.com/world" class="ot-pluss soc-click"> 
       <span class="text-icon">&#xF30F;</span>GOOGLE+</a> 
       <span class="likes-count"> 
        <span class="count">0</span> 
        <span class="bullet">&#xA0;</span> 
       </span> 
      </div> 
     </div> 
    </div> 
</div> 

所以basiccaly,我希望所有的内容类的HTML,但不具有类=“文章标题”,类元素=“文章,社交”和class =“文章标签都有效”

所以它会得到剥离下来:

<div class="content"> 
    <p class="p1"> 
     <span class="s1"><b>a test</b></span> 
    </p> 
    <p class="p2"> 
     <span class="s1">text2</span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b><a href="http://www.helloworld.com/hello.jpg"> 
      <img class="alignright size-medium wp-image-19472" src="http://www.helloworld.com/hello.jpg" alt="hello" width="300" height="218"></a>Hello</b> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b>text text text</b></span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b><a href="http://www.helloworld.com/hello2.jpg"> 
      <img class="alignleft size-medium wp-image-19474" src="http://www.helloworld.com/hello2.jpg" alt="hello2" width="300" height="200"></a>Hello2</b> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1">text1</span> 
    </p> 
    <p class="p1"> 
     <span class="s1">text2</span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b>Final thoughts</b></span> 
    </p> 
    <p class="p1"> 
     <span class="s1">testing (<a href="http://www.helloworld.com/test"> 
      <span class="s2">test</span></a>, 
      <a href="http://www.helloworld.com/test2"> 
      <span class="s2">test2</span></a> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1">***</span> 
    </p> 
    <p class="p5"><em> 
     <span class="s1">xyz <a href="http://www.helloworld.com/xyz"> 
      <span class="s2">123</span></a> (at <a href="http://www.helloworld.com"> 
      <span class="s2">http://www.helloworld.com</span></a>. &#xA0; 
     </span></em> 
    </p> 
    <div class="panel-breaking-line"></div> 
    <div class="panel-breaking-line"></div> 
</div> 

带或不带内容的div定义...

我尝试了很多表达,我来了到这一点:

//This is working but returning all content of the div 

    $xpath = new DOMXPath($doc); 
    $elements = @$xpath->query("."); 
    foreach ($elements as $element) 
     $results .= $element->ownerDocument->saveHTML($element); 
    } 
这个表达式,而不只是点

然后:

div[@class='content']/*[not(contains(concat(' ', @class, ' '), 'article-title')) and not(contains(concat(' ', @class, ' '), 'article-social')) and not(contains(concat(' ', @class, ' '), 'article-tags'))] 

不退还我任何东西,任何想法我怎么能得到这个东西的工作?

+0

你只需要添加领先的''//://'DIV [@类= '内容']/* [不包含(concat('',@class,''),'article-title'))而不是(包含(concat('',@class,''),'article-social'))而不是包含(concat('',@class,''),'article-tags'))]' – har07 2014-10-01 04:34:06

回答

0

你可以只明确地把它们放在not(contains())

$dom = new DOMDocument(); 
$dom->formatOutput = true; 
$dom->loadHTML($markup); 

$xpath = new DOMXpath($dom); 

$elements = $xpath->query(' 
//div[@class="content"]/*[ 
    not(contains(@class, "article-title")) and 
    not(contains(@class, "article-socials")) and 
    not(contains(@class, "article-tags")) 
] 
'); 

$html = ''; 
foreach ($elements as $child) { 
    $html .= $dom->saveXML($child); 
} 

echo htmlentities($html); 

Output

+0

工作除了某些原因我不得不删除htmlentities函数....不知道为什么! – TheGreatOne 2014-10-04 04:34:28

+0

@ TheGreatOne im很高兴这有帮助 – Ghost 2014-10-04 04:36:06