2014-10-20 24 views
1

我试图以编程方式确定链接是否是链接到Imgur图像或不。一个Imgur图片链接的一个例子是:http://imgur.com/0AKSCQ4http://i.imgur.com/0AKSCQ4.jpg(第一是间接的联系,而后者是直接的,但编号保持不变)我该如何检测什么是Imgur图片链接,哪些不是?

我想http://imgur.com/0AKSCQ4时如果Imgur链接要求评估,以true ,但http://imgur.com/galleryfalse。我很困惑如何区分这两者,当他们都imgur.com/*letters*

我问,因为我知道Reddit Enhancement Suite有这个功能。如果我发布http://imgur.com/gallery它不提供图像按钮来预览它,但它会为http://imgur.com/0AKSCQ4

那么我将如何能够识别此?找到不符合条件的每个词,例如galleryjobsaboutimgur.com/*whatever*中看起来真的很乱,并且会在添加任何新页面时崩溃。并且在第二部分中不存在总是的数字,所以我不能依靠它来识别它。

+0

当然,你有这样做的一个优选的框架。考虑一下,你应该首先用合适的URL解析器解析URL,然后将测试应用到主机名和相对路径组件(可能还要检查协议,端口等)。有一种高度发展的URL混淆科学,旨在打败基于字符串模式的测试。 – 2014-10-20 01:58:28

+0

什么框架?特别针对Imgur链接?不幸的是,我没有。 – 2014-10-20 02:31:17

+0

您用于大部分应用程序开发的框架。您是否将此作为网络服务?然后像ASP.NET或PHP或Rails。即使你对其他实现开放,也可以说出你最熟悉的内容。 – 2014-10-20 02:47:57

回答

2

运行该代码段为JavaScript例如

$(function(){ 
 
    
 
    var url_re = /https?[^<"]+/g /* pattern for url-like substrings */ 
 
    
 
    var txt = $(".post-text").html(); /* taking this question text as input */ 
 
    
 
\t while(m = url_re.exec(txt)){ /* match all url-like substrings in input */ 
 
     
 
     /* verify if it's a imgur URL */ 
 
     
 
\t \t var imgur_re = /^https?:\/\/(\w+\.)?imgur.com\/(\w*\d\w*)+(\.[a-zA-Z]{3})?$/ 
 
     
 
     
 
     /* Show result */ 
 
     
 
     $("#results").append("<li>" + m + ": " + imgur_re.test(m) + "</li>"); 
 
\t } 
 
    
 
});
<ul id="results"></ul> 
 

 
<div class="post-text" itemprop="text"> 
 
<p>I'm trying to programmatically figure out whether or not an link is a link to an Imgur image or not. An example of an Imgur image link would be: <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a> or <a href="http://i.imgur.com/0AKSCQ4.jpg" rel="nofollow">http://i.imgur.com/0AKSCQ4.jpg</a> (the first is an indirect link and the latter is direct, but the ID stays the same)</p> 
 

 
<p>I want <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a> to evaluate to <code>true</code> when asked if an Imgur link, but <a href="http://imgur.com/gallery" rel="nofollow">http://imgur.com/gallery</a> to be <code>false</code>. I'm confused how to distinguish between those two when they're both <code>imgur.com/*letters*</code>.</p> 
 

 
<p>I ask because I know <a href="http://redditenhancementsuite.com" rel="nofollow">Reddit Enhancement Suite</a> has this functionality. If I post <a href="http://imgur.com/gallery" rel="nofollow">http://imgur.com/gallery</a> it doesn't offer an image button to preview it, but it would for <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a></p> 
 

 
<p>So how would I be able to identify this? Finding every word that doesn't qualify, like <code>gallery</code>, <code>jobs</code>, or <code>about</code> in <code>imgur.com/*whatever*</code> would seem really hacky, and would break upon any new page being added. And there's not <em>always</em> numbers in the second part so I can't rely on that to identify it.</p> 
 
</div> 
 

 

 
<script type="text/javascript" src="//code.jquery.com/jquery-2.1.1.min.js"></script>

+0

只是一个用于解析ID的替代正则表达式。这将匹配/不包含“http(s)://”,并从i.imgur.com中提取ID(包括缩略图后缀和网页),图库图像(可以从imgur中以普通图像的形式检索) API,我正在使用),当然还有定期图片。请注意“www。”不匹配,因为imgur应该自动重定向而不使用“www”,所以人们不应该提供这样的URL。 '(?:HTTPS:\/\ /)?????(?:I \)imgur \ .COM \ /(?:长廊\ /)(+(= [sbtmlh] \ .. {3, 4)} |。+(?= \ .. {3,4})|。+?(?= \ s))' – cyanic 2016-03-01 15:50:34

+0

编辑修复锚定到最后(我的用例需要链接在中(?:https:\/\ /)?(?: i \。)?imgur \ .com \ /(?: gallery \ /)?(。+(?= [sbtmlh] \ .. {3,4})| +(?= \ .. {3,4})| +(:???(= \ s)| $))' – cyanic 2016-03-01 15:57:44

相关问题