2017-07-28 103 views
1

在我的网站内容中有一些没有尾随“/”的内部链接,它对我造成了一些抓取问题。想要进行搜索并替换这些链接。所以https://www.example.com/slug应该变成https://www.example.com/slug/。我使用下面的函数来推动一个页面在整个视频内容和替换页面上的所有必要的链接:将斜杠添加到preg_replace的链接

function str_replace_links($subject, &$count) { 
    //match the first part of the link http://www.example.com{/slug} 
    $regex = '/(https:\/\/www.example.com)(\/[a-zA-Z_0-9\-]*)*'; 
    //check for the trailing '/' or if it is a file 
    $regex .= '([^(\/|\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.torrent|\.ttf|\.woff|\.svg|\.eot|\.woff2)])'; 
    //finish ooff regex 
    $regex .= '/i'; 
    $i; // counter for # changed 
    $content = preg_replace($regex, '$1$2/', $subject, 1, $i); 
    $count += $i; 
    return $content; 
} 

我试图测试一个字符串几个环节:

$string =' 
<a href="https://www.example.com/slug1/page">1</a><br/> 
<a href="https://www.example.com/slug2/page">2</a><br/> 
<a href="https://www.example.com/slug1/page/">3</a><br/> 
<a href="https://www.example.com/slug2/page/">4</a><br/> 
<a href="https://www.example.com/">5</a><br/> 
<a href="https://www.example.com">5b</a><br/> 
<a href="https://www.example.com/style.css">6</a><br/> 
<a href="https://www.example.com/style.jpg">7</a><br/> 
<a href="https://www.example.com/style.png">8</a><br/> 
<a href="https://www.example.com/style.pdf">9</a><br/> 
'; 

echo str_replace_links($string, $switch); 

然而,这不会导致正确的结果:

<a href="https://www.example.com/page/>1</a><br/> 
<a href="https://www.example.com/page/>2</a><br/> 
<a href="https://www.example.com//>3</a><br/> 
<a href="https://www.example.com//>4</a><br/> 
<a href="https://www.example.com//>5</a><br/> 
<a href="https://www.example.com/>5b</a><br/> 
<a href="https://www.example.com/st/le.css">6</a><br/> 
<a href="https://www.example.com/st/le.jpg">7</a><br/> 
<a href="https://www.example.com/st/le.png">8</a><br/> 
<a href="https://www.example.com/st/le.pdf">9</a><br/> 

任何与正则表达式的帮助将不胜感激。

回答

0

你可以使用一个经过调整的URL验证器来做到这一点。

~(?i)(?<=")((?!mailto:)(?:[a-z]*:\/\/)?(?:\S+(?::\S*)[email protected])?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{a1}-\x{ffff}0-9]+-?)*[a-z\x{a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{a1}-\x{ffff}0-9]+-?)*[a-z\x{a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{a1}-\x{ffff}]{2,})))|localhost)(:\d{2,5})?(?:\/(?:[^\s/]*/)*[^\s/.]+)?)(?=")~

https://regex101.com/r/GcT8ZU/1

格式化

(?i) 

(?<= ") 
(       # (1 start) 
     (?! mailto:) 
     (?: [a-z]* :\/\/)? 
     (?: 
      \S+ 
      (?: : \S*)? 
      @ 
    )? 
     (?: 
      (?: 
       (?: 
        [1-9] \d? 
        | 1 \d\d 
        | 2 [01] \d 
        | 22 [0-3] 
       ) 
       (?: 
        \. 
        (?: 1? \d{1,2} | 2 [0-4] \d | 25 [0-5]) 
       ){2} 
       (?: 
        \. 
        (?: 
          [1-9] \d? 
         | 1 \d\d 
         | 2 [0-4] \d 
         | 25 [0-4] 
        ) 
       ) 
      | (?: 
        (?: [a-z\x{a1}-\x{ffff}0-9]+ -?)* 
        [a-z\x{a1}-\x{ffff}0-9]+ 
       ) 
       (?: 
        \. 
        (?: [a-z\x{a1}-\x{ffff}0-9]+ -?)* 
        [a-z\x{a1}-\x{ffff}0-9]+ 
       )* 
       (?: 
        \. 
        (?: [a-z\x{a1}-\x{ffff}]{2,}) 
       ) 
      ) 
     | localhost 
    ) 
     (: \d{2,5})?    # (2) 
     (?: 
      \/ 
      (?: [^\s/]* /)* 
      [^\s/.]+ 
    )? 
)        # (1 end) 
(?= ") 
+1

真棒!谢谢 – jppower175