2016-12-06 282 views
-3

How can I remove the links of an HTML file?如何删除HTML,如果它不是真正的丑“<a href=” tags using perl

I want remove the links:

  • Typography
  • Shortcodes
  • Tables
  • FAQ
  • I wan't remove the links:

    <ul class="dropdown-menu"> 
    
         <li><a href="index_fixed.html">Home/Fixed</a></li> 
         <li><a href="index_with_blog.html">Home + Blog</a></li> 
         <li><a href="portfolio.html">Portfolio</a></li> 
         <li><a href="blog.html">Blog & News</a></li> 
        </ul> 
        </li> 
    
        <li><a href="left_sidebar.html">left sidebar</a></li> 
        <li><a href="right_sidebar.html">right sidebar</a></li> 
        <li><a href="full_width.html">full page</a></li> 
        <li><a href="contact.html">contact us</a></li> 
    
    </ul> 
    

    This is my code, but it is not working:

    #!/usr/bin/perl 
    ########################################## Carrega Modulos 
    
    use LWP::UserAgent; 
    use LWP::Simple; 
    
    $ua = new LWP::UserAgent; 
    $ua->agent('Mozilla/5.0 (X11; U; NetBSD i386; en-US; rv:1.8.1.12) Gecko/20080301 Firefox/2.0.0.12'); 
    
    my $pedido1 = new HTTP::Request GET =>"http://localhost/site1/index.html"; 
    my $resposta1 = $ua->request($pedido1) or die "Error\n"; 
    my $res1 = $resposta1->content; 
    open (OUT, ">>hit.txt"); print OUT "$res1\n"; close(OUT); $cont=$cont+1; 
    
    $res1 =~ s/"<li><a href=\"typography.html\">Typography<\/a><\/li>"/""/g; 
    $res1 =~ s/"<li><a href=\"shortcodes.html\">Shortcodes<\/a><\/li>"/""/g; 
    $res1 =~ s/"<li><a href=\"blog.html\">Blog & News<\/a><\/li>"/""/g; 
    $res1 =~ s/"<li><a href=\"tables.html\">Tables<\/a><\/li>"/""/g; 
    $res1 =~ s/"<li><a href=\"faq.html\">FAQ<\/a><\/li>"/""/g; 
    print $res1; 
    

    This my HTML:

    <!DOCTYPE html> 
    
    Reponsive HTML Template 
    
    http://fonts.googleapis.com/css?family=Roboto:400,300,700,100' rel='stylesheet' type='text/css'> 
    
    
    
    <!-- Collect the nav links, forms, and other content for toggling --> 
    
        <div class="collapse navbar-center navbar-collapse" id="bs-example-navbar-collapse-1"> 
    
        <ul class="nav navbar-nav"> 
    
         <li class="active"><a href="index.html">Home</a></li> 
         <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Pages</a> 
    
         <ul class="dropdown-menu"> 
    
          <li><a href="index_fixed.html">Home/Fixed</a></li> 
          <li><a href="index_with_blog.html">Home + Blog</a></li> 
          <li><a href="portfolio.html">Portfolio</a></li> 
          <li><a href="typography.html">Typography</a></li> 
          <li><a href="shortcodes.html">Shortcodes</a></li> 
          <li><a href="blog.html">Blog & News</a></li> 
          <li><a href="tables.html">Tables</a></li> 
          <li><a href="faq.html">FAQ</a></li> 
    
         </ul> 
         </li> 
    
         <li><a href="left_sidebar.html">left sidebar</a></li> 
         <li><a href="right_sidebar.html">right sidebar</a></li> 
         <li><a href="full_width.html">full page</a></li> 
         <li><a href="contact.html">contact us</a></li> 
    
        </ul> 
        </div> 
    
        <!-- /.navbar-collapse --> 
    
        <div class="clr"></div> 
    
        <!-- Brand and toggle get grouped for better mobile display --> 
    
        <div class="navbar-header"> 
    
        <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> 
    
        <h1 class="navbar-brand"><a href="index.html"><span>anti</span>que</a></h1> 
    
        </div> 
    </nav> 
    
    <!-- Indicators --> 
    
    <div class="carousel-inner"> 
    
        <div class="item"> <img data-src="images/slider/slider1.jpg" alt="First slide" src="images/slider/slider1.jpg"> 
    
        <div class="container"> 
         <div class="carousel-caption"> 
    
         <h1>Vivamus ultricies volutpat egestas. Donec <span>turpis non eros</span> euismod </h1> 
    
         <p>Aliquam sit amet lectus sagittis, feugiat neque dictum, rutrum augue. Integer vel egestas urna. </p> 
         <p><a class="btn btn-default" href="#" role="button">more details</a></p> 
    
         </div> 
        </div> 
        </div> 
    
        <div class="item active"> <img data-src="images/slider/slider2.jpg" alt="Second slide" src="images/slider/slider2.jpg"> 
    
        <div class="container"> 
         <div class="carousel-caption"> 
         <h1>Donec <span>volutpat mattis</span> odio. Quisque eros. Nullam malesuada. </h1> 
    
         <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec odio. Quisque volutpat mattis eros. </p> 
         <p><a class="btn btn-default" href="#" role="button">get started</a></p> 
    
         </div> 
        </div> 
        </div> 
    </div> 
    
    <a class="left carousel-control" href="#myCarousel" data-slide="prev"><span class="glyphicon carousel-control-left"></span></a> <a class="right carousel-control" href="#myCarousel" data-slide="next"><span class="glyphicon carousel-control-right"></span></a> </div> 
    
        <h2 class="text-center">Phasellus ultrices nulla quis nibh. Quisque a lectus. Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</h2> 
    
        <p class="text-center big-paragraph">Suspendisse urna nibh, viverra non, semper suscipit, posuere a, pede.</p> 
    
    </div> 
    
        <h2><span>our services</span></h2> 
    
        <div class="row"> 
        <div class="col-md-4"> <img src="images/icons/ico1.png" alt="icon" class="icon"> 
    
         <h3>CLEAN FLAT & MINIMAL</h3> 
    
         <img src="images/content__images/img1.jpg" alt="image" class="img-rounded img-responsive"> 
    
         <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec odio. Quisque volutpat mattis eros.</p> 
         <p><a class="btn btn-lg btn-primary" href="#" role="button">Learn more</a></p> 
    
        </div> 
    
        <div class="col-md-4"> <img src="images/icons/ico2.png" alt="icon" class="icon"> 
    
         <h3>FULLY RESPONSIVE</h3> 
    
         <img src="images/content__images/img2.jpg" alt="image" class="img-rounded img-responsive"> 
    
         <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec odio. Quisque volutpat mattis eros.</p> 
         <p><a class="btn btn-lg btn-primary" href="#" role="button">Learn more</a></p> 
    
        </div> 
    
        <div class="col-md-4"> <img src="images/icons/ico3.png" alt="icon" class="icon"> 
    
         <h3>EASY TO CUSTOMIZE</h3> 
    
         <img src="images/content__images/img3.jpg" alt="image" class="img-rounded img-responsive"> 
    
         <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec odio. Quisque volutpat mattis eros.</p> 
         <p><a class="btn btn-lg btn-primary" href="#" role="button">Learn more</a></p> 
    
        </div> 
        </div> 
    </div> 
    
        <h2 class="text-center"><span>about us</span></h2> 
    
        <div class="row text-center"> 
        <div class="col-md-6"> 
    
         <h3>Donec odio. Quisque volutpat mattis eros. 
    
         Nullam malesuada erat. </h3> 
    
         <p><small>Praesent semper mod quis eget mi. Etiam eu ante risus. </small></p> 
    
         <img src="images/content__images/pic1.jpg" class="img-rounded img-responsive" alt="pic1"> 
    
         <div class="clearfix"></div> 
    
         <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec odio. Quisque volutpat mattis eros. Nullam malesuada erat ut turpis. Suspendisse urna nibh, viverra non, semper suscipit, posuere a, pede.</p> 
    
         <p><a class="btn btn-info btn-lg" href="#" role="button">Learn more</a></p> 
    
        </div> 
    
        <div class="col-md-6"> 
    
         <h3>Etiam eu ante risus. Aliquam erat volutpat. 
    
         Aliquam luctus mattis.</h3> 
    
         <p><small>Praesent semper mod quis eget mi. Etiam eu ante risus. </small></p> 
    
         <img src="images/content__images/pic2.jpg" class="img-rounded img-responsive" alt="pic2"> 
    
         <div class="clearfix"></div> 
    
         <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec odio. Quisque volutpat mattis eros. Nullam malesuada erat ut turpis. Suspendisse urna nibh, viverra non, semper suscipit, posuere a, pede.</p> 
    
         <p><a class="btn btn-info btn-lg" href="#" role="button">Learn more</a></p> 
    
        </div> 
        </div> 
    </div> 
    
    <div class="container"> 
    
        <h2 class="">Pellentesque egestas sem. Suspendisse commodo ullamcorper magna. Pellentesque egestas sem suspendisse commodo ullamcorper ...</h2> 
    
        <p class="">Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore eritatis et quasi architecto beatae vitae dicta sunt explicabo. 
    
        Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione 
    
        voluptatem sequi nesciunt.</p> 
    
        <p><a class="btn btn-info" href="#" role="button">Buy it</a><a class="btn btn-default" href="#" role="button">Learn More</a></p> 
    
    </div> 
    
        <h3>About</h3> 
    
        <p>We strive to deliver a level of service that exceeds the expectations of our customers. <br /> 
    
         <br /> 
    
         If you have any questions about our products or services, please do not hesitate to contact us. We have friendly, knowledgeable representatives available seven days a week to assist you.</p> 
    
        </div> 
    
        <div class="col-md-3"> 
    
        <h3>Tweets</h3> 
    
        <p><span>Tweet</span> <a href="#">@You</a><br /> 
    
         Etiam egestas, ipsum posuere accumsan sollicitudin, nulla mauris volutpat sem, sit amet rutrum risus. </p> 
    
        <p><span>Tweet</span> <a href="#">@You</a><br /> 
    
         Quisque porta tellus vitae adipiscing molestie. Mauris et lacus blandit, malesuada.</p> 
    
        </div> 
    
        <div class="col-md-3"> 
    
        <h3>Mailing list</h3> 
    
        <p>Subscribe to our mailing list for offers, news updates and more!</p> 
    
        <br /> 
    
        <form action="#" method="post" class="form-inline" role="form"> 
    
         <div class="form-group"> 
    
         <label class="sr-only" for="exampleInputEmail2">your email:</label> 
    
         <input type="email" class="form-control" id="exampleInputEmail2" placeholder="your email:"> 
    
         </div> 
    
         <button type="submit" class="btn btn-primary">subscribe</button> 
    
        </form> 
        </div> 
    
        <div class="col-md-3"> 
    
        <h3>Business</h3> 
    
        <p>Street<br /> 
    
         City, State <br /> 
    
         Country<br /> 
    
         <br /> 
    
         Phone: (111) 123-4567<br /> 
    
         Fax: (111) 123-4567<br /> 
    
         <br /> 
        </p> 
    
        <div class="social__icons"> <a href="#" class="socialicon socialicon-twitter"></a> <a href="#" class="socialicon socialicon-facebook"></a> <a href="#" class="socialicon socialicon-google"></a> </div> 
    
        </div> 
    </div> 
    
    $('.carousel').carousel({ interval: 3500, // in milliseconds pause: 'none' // set to 'true' to pause slider on mouse hover }) 
    

    Thanks very much

    +0

    不要发布您的整个代码。只发布相关部分。截至目前,你的问题是一个巨大的文字墙,大多数人只是跳过你的问题。 – Aserre

    +1

    请参阅http://stackoverflow.com/help/mcve。 – chris85

    +0

    谢谢chris85。 我以为我会把所有的信息放在一起,所以毫无疑问。 – Domenike

    回答

    2

    Use a parser to handle changes in HTML. XML::LibXML可以解析HTML:

    #!/usr/bin/perl 
    use warnings; 
    use strict; 
    
    use XML::LibXML; 
    
    my $html = ...; # load the HTML file 
    my $dom = 'XML::LibXML'->load_html(string => $html, recover => 1); 
    
    my @delete = qw(Typography Shortcodes Tables FAQ); 
    my $condition = join ' or ', map "text()='$_'", @delete; 
    
    for my $anchor ($dom->findnodes("//a[$condition]/..")) { 
        $anchor->parentNode->removeChild($anchor); 
    } 
    print $dom; 
    

    它删除不只是锚,但他们的父母<li的为好。

    +0

    很好的解决方案。然而,我想知道OP了解如何建立'$ condition'。 – Robert

    +0

    我敢肯定XML ::的libxml会噎死的OP发布什么...第5行包含关闭'>',但没有对应的任何'<',他们使用自闭结束标签,例如''没有尾随斜线等 – ThisSuitIsBlackNot

    +0

    @ThisSuitIsBlackNot:我只从字符串中删除了''。糟糕,它可能是样本的较旧版本,它现在可以工作。 – choroba

    相关问题