html
  • regex
  • 2016-04-22 50 views 1 likes 
    1

    我需要在html中取消<a>标签。Scrap anchor(<a>)html标签

    我的目标是废弃href属性中具有有效链接的标签。

    我觉得我非常接近的答案,这是我写的正则表达式:

    <a .*href=("|').*\.asp("|').*?>.*?<\/a> 
    

    http://regexr.com/3d989

    FIRST ISSUE:

    结果:

    <a id='topnavbtn_tutorials' href='javascript:void(0);' onclick='w3_open_nav("tutorials")' title='Tutorials'>TUTORIALS <i class='fa fa-caret-down'></i><i class='fa fa-caret-up' style='display:none'></i></a><a id='topnavbtn_references' href='javascript:void(0);' onclick='w3_open_nav("references")' title='References'>REFERENCES <i class='fa fa-caret-down'></i><i class='fa fa-caret-up' style='display:none'></i></a><a id='topnavbtn_examples' href='javascript:void(0);' onclick='w3_open_nav("examples")' title='Examples'>EXAMPLES <i class='fa fa-caret-down'></i><i class='fa fa-caret-up' style='display:none'></i></a><a href='/forum/default.asp'>FORUM</a> 
    

    我只需要:

    <a href='/forum/default.asp'>FORUM</a> 
    

    第二个问题:

    结果:

    <a href='/html/default.asp' class='w3-hide-small' title='HTML Tutorial'>HTML</a><a href='/css/default.asp' class='w3-hide-small' title='CSS Tutorial'>CSS</a><a href='/js/default.asp' class='w3-hide-small' title='JavaScript Tutorial'>JAVASCRIPT</a><a href='/sql/default.asp' class='w3-hide-small' title='SQL Tutorial'>SQL</a><a href='/php/default.asp' class='w3-hide-small' title='PHP Tutorial'>PHP</a><a href='/bootstrap/default.asp' class='w3-hide-small' title='Bootstrap Tutorial'>BOOTSTRAP</a><a href='/jquery/default.asp' class='w3-hide-small' title='jQuery Tutorial'>JQUERY</a><a href='/angular/default.asp' class='w3-hide-small' title='Angular Tutorial'>ANGULAR</a><a href='/xml/default.asp' class='w3-hide-small' title='XML Tutorial'>XML</a> 
    

    ,我需要他们作为单独的结果:

    <a href='/html/default.asp' class='w3-hide-small' title='HTML Tutorial'>HTML</a> 
    
    <a href='/css/default.asp' class='w3-hide-small' title='CSS Tutorial'>CSS</a> 
    
    <a href='/js/default.asp' class='w3-hide-small' title='JavaScript Tutorial'>JAVASCRIPT</a> 
    

    等等...

    +0

    “这是我写的正则表达式” - 这是一个链接。把你的代码放在问题中。 – Quentin

    回答

    -1
    $string = "<a href='/html/default.asp' class='w3-hide-small' title='HTML Tutorial'>HTML</a><a href='/css/default.asp' class='w3-hide-small' title='CSS Tutorial'>CSS</a><a href='/js/default.asp' class='w3-hide-small' title='JavaScript Tutorial'>JAVASCRIPT</a><a href='/sql/default.asp' class='w3-hide-small' title='SQL Tutorial'>SQL</a><a href='/php/default.asp' class='w3-hide-small' title='PHP Tutorial'>PHP</a><a href='/bootstrap/default.asp' class='w3-hide-small' title='Bootstrap Tutorial'>BOOTSTRAP</a><a href='/jquery/default.asp' class='w3-hide-small' title='jQuery Tutorial'>JQUERY</a><a href='/angular/default.asp' class='w3-hide-small' title='Angular Tutorial'>ANGULAR</a><a href='/xml/default.asp' class='w3-hide-small' title='XML Tutorial'>XML</a>"; 
    
    preg_match_all('%<a href=\'/.*?\'>.*?</a>%s', $string, $matches, PREG_PATTERN_ORDER); 
    for ($i = 0; $i < count($matches[0]); $i++) { 
        echo $matches[0][$i]; 
    } 
    

    OUTPUT:

    <a href='/html/default.asp' class='w3-hide-small' title='HTML Tutorial'>HTML</a> 
    <a href='/css/default.asp' class='w3-hide-small' title='CSS Tutorial'>CSS</a> 
    <a href='/js/default.asp' class='w3-hide-small' title='JavaScript Tutorial'>JAVASCRIPT</a> 
    <a href='/sql/default.asp' class='w3-hide-small' title='SQL Tutorial'>SQL</a> 
    <a href='/php/default.asp' class='w3-hide-small' title='PHP Tutorial'>PHP</a> 
    <a href='/bootstrap/default.asp' class='w3-hide-small' title='Bootstrap Tutorial'>BOOTSTRAP</a> 
    <a href='/jquery/default.asp' class='w3-hide-small' title='jQuery Tutorial'>JQUERY</a> 
    <a href='/angular/default.asp' class='w3-hide-small' title='Angular Tutorial'>ANGULAR</a> 
    <a href='/xml/default.asp' class='w3-hide-small' title='XML Tutorial'>XML</a> 
    

    DEMO:

    https://ideone.com/eFHU8n


    注:

    Why you shouldn't use regex to parse html

    +0

    1. Can you apply your regex in this link http://regexr.com/ ? 2. What is the alternative of using regex if I need certain tags from a long XHTML string ? – ohadinho

    1

    已更新。见下文。

    如果你有一个字符串形式的HTML,你可以做这样的事情:

    // split the string up by anchor tags 
    // nested anchor tags is illegal, so this seems feasible: 
    var anchorArray = str.replace(/><a/g, '>¶<a').split('¶'); // ¶ is a placeholder to split 
    
    var matches = []; 
    var re = /<a .*href=["'].*\.asp["'].*?>.*?<\/a>/g; 
    
    // filter out the anchor elements with actual links in the final HTML 
    anchorArray.filter(function(element) { 
        if (re.test(element)) { 
         matches.push(element); // keep the match in an array (2nd condition) 
         return false; 
        } 
        else return true;  
    }); 
    
    var returnedHTML = anchorArray.join(''); // HTML w/o actual links (1st condition) 
    

    注意,解析HTML的首选方法是不使用正则表达式,但有一个HTML解析器。

    +0

    此解决方案给出了一个假阳性结果:'TUTORIALS ' – Adib

    +0

    Am I missing something? https://regex101.com/r/mN8zN2/2 It doesn't match. – timolawl

    +0

    Try this: https://regex101.com/r/mN8zN2/3 (directly from the example in the link by OP) – Adib

    0

    这将帮助你

    var matches = []; 
    
    input_content.replace(/[^<]*(<a href="([^"]+)">/w*<\a>)/g, function() { 
        matches.push(Array.prototype.slice.call(arguments, 1)) 
    }); 
    

    它返回所有比赛为数组变量的比赛!

    +0

    This only provides 4 matches. There are more links that qualify – Adib

    相关问题