我该如何检测什么是Imgur图片链接，哪些不是？

我试图以编程方式确定链接是否是链接到Imgur图像或不。一个Imgur图片链接的一个例子是：http://imgur.com/0AKSCQ4或http://i.imgur.com/0AKSCQ4.jpg（第一是间接的联系，而后者是直接的，但编号保持不变）我该如何检测什么是Imgur图片链接，哪些不是？

我想http://imgur.com/0AKSCQ4时如果Imgur链接要求评估，以true ，但http://imgur.com/gallery为false。我很困惑如何区分这两者，当他们都imgur.com/*letters*。

我问，因为我知道Reddit Enhancement Suite有这个功能。如果我发布http://imgur.com/gallery它不提供图像按钮来预览它，但它会为http://imgur.com/0AKSCQ4

那么我将如何能够识别此？找到不符合条件的每个词，例如gallery，jobs或about在imgur.com/*whatever*中看起来真的很乱，并且会在添加任何新页面时崩溃。并且在第二部分中不存在总是的数字，所以我不能依靠它来识别它。

来源

2014-10-20 Doug Smith

当然，你有这样做的一个优选的框架。考虑一下，你应该首先用合适的URL解析器解析URL，然后将测试应用到主机名和相对路径组件（可能还要检查协议，端口等）。有一种高度发展的URL混淆科学，旨在打败基于字符串模式的测试。 – 2014-10-20 01:58:28

什么框架？特别针对Imgur链接？不幸的是，我没有。 – 2014-10-20 02:31:17

您用于大部分应用程序开发的框架。您是否将此作为网络服务？然后像ASP.NET或PHP或Rails。即使你对其他实现开放，也可以说出你最熟悉的内容。 – 2014-10-20 02:47:57

运行该代码段为JavaScript例如

$(function(){ 
 
    
 
    var url_re = /https?[^<"]+/g /* pattern for url-like substrings */ 
 
    
 
    var txt = $(".post-text").html(); /* taking this question text as input */ 
 
    
 
\t while(m = url_re.exec(txt)){ /* match all url-like substrings in input */ 
 
     
 
     /* verify if it's a imgur URL */ 
 
     
 
\t \t var imgur_re = /^https?:\/\/(\w+\.)?imgur.com\/(\w*\d\w*)+(\.[a-zA-Z]{3})?$/ 
 
     
 
     
 
     /* Show result */ 
 
     
 
     $("#results").append("<li>" + m + ": " + imgur_re.test(m) + "</li>"); 
 
\t } 
 
    
 
});

<ul id="results"></ul> 
 

 
<div class="post-text" itemprop="text"> 
 
<p>I'm trying to programmatically figure out whether or not an link is a link to an Imgur image or not. An example of an Imgur image link would be: <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a> or <a href="http://i.imgur.com/0AKSCQ4.jpg" rel="nofollow">http://i.imgur.com/0AKSCQ4.jpg</a> (the first is an indirect link and the latter is direct, but the ID stays the same)</p> 
 

 
<p>I want <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a> to evaluate to <code>true</code> when asked if an Imgur link, but <a href="http://imgur.com/gallery" rel="nofollow">http://imgur.com/gallery</a> to be <code>false</code>. I'm confused how to distinguish between those two when they're both <code>imgur.com/*letters*</code>.</p> 
 

 
<p>I ask because I know <a href="http://redditenhancementsuite.com" rel="nofollow">Reddit Enhancement Suite</a> has this functionality. If I post <a href="http://imgur.com/gallery" rel="nofollow">http://imgur.com/gallery</a> it doesn't offer an image button to preview it, but it would for <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a></p> 
 

 
<p>So how would I be able to identify this? Finding every word that doesn't qualify, like <code>gallery</code>, <code>jobs</code>, or <code>about</code> in <code>imgur.com/*whatever*</code> would seem really hacky, and would break upon any new page being added. And there's not <em>always</em> numbers in the second part so I can't rely on that to identify it.</p> 
 
</div> 
 

 

 
<script type="text/javascript" src="//code.jquery.com/jquery-2.1.1.min.js"></script>

来源

2014-10-20 11:43:44 kums

只是一个用于解析ID的替代正则表达式。这将匹配/不包含“http（s）：//”，并从i.imgur.com中提取ID（包括缩略图后缀和网页），图库图像（可以从imgur中以普通图像的形式检索） API，我正在使用），当然还有定期图片。请注意“www。”不匹配，因为imgur应该自动重定向而不使用“www”，所以人们不应该提供这样的URL。 '（？：HTTPS：\/\ /）？？？？？（？：I \）imgur \ .COM \ /（?:长廊\ /）（+（= [sbtmlh] \ .. {3， 4）} |。+（？= \ .. {3,4}）|。+？（？= \ s））' – cyanic 2016-03-01 15:50:34

编辑修复锚定到最后（我的用例需要链接在中（？：https：\/\ /）？（?: i \。）？imgur \ .com \ /（?: gallery \ /）？（。+（？= [sbtmlh] \ .. {3,4}）| +（？= \ .. {3,4}）| +（：？？？（= \ s）| $））' – cyanic 2016-03-01 15:57:44

我该如何检测什么是Imgur图片链接，哪些不是？

回答

相关问题