2010-06-10 46 views
0

我工作的一个链接检查器/断开的链接查找器,我收到很多误报,经过双重检查后,我注意到许多错误代码返回webexceptions,但他们实际上是可下载的,但在其他情况代码是404,我可以从浏览器访问页面。链接检查器;如何避免误报

所以这里是代码,它的相当丑陋,和id喜欢有更多的东西,ID说实用。如果用于过滤那些我不想添加到brokenlink的所有状态代码,因为它们是有效的链接(我测试了它们全部)。我需要修复的是结构(如果可能的话)以及如何不弄错404.

谢谢!

try 
{ 
    HttpWebRequest request = (HttpWebRequest) WebRequest.Create (uri); 
    request.Method = "Head"; 
    request.MaximumResponseHeadersLength = 32; // FOR IE SLOW SPEED 
    request.AllowAutoRedirect = true; 
    using (HttpWebResponse response = (HttpWebResponse) request.GetResponse()) 
    { 
     request.Abort(); 
    } 
    /* WebClient wc = new WebClient(); 
    wc.DownloadString(uri); */ 

    _validlinks.Add (strUri); 
} 
catch (WebException wex) 
{ 
    if ( !wex.Message.Contains ("The remote name could not be resolved:") && 
      wex.Status != WebExceptionStatus.ServerProtocolViolation) 
    { 
     if (wex.Status != WebExceptionStatus.Timeout) 
     { 
     HttpStatusCode code = ((HttpWebResponse) wex.Response).StatusCode; 
     if (
      code != HttpStatusCode.OK && 
      code != HttpStatusCode.BadRequest && 
      code != HttpStatusCode.Accepted && 
      code != HttpStatusCode.InternalServerError && 
      code != HttpStatusCode.Forbidden && 
      code != HttpStatusCode.Redirect && 
      code != HttpStatusCode.Found 
     ) 
     { 
      _brokenlinks.Add (new Href (new Uri (strUri , UriKind.RelativeOrAbsolute) , UrlType.External)); 
     } 
     else _validlinks.Add (strUri); 
     } 
     else _brokenlinks.Add (new Href (new Uri (strUri , UriKind.RelativeOrAbsolute) , UrlType.External)); 
    } 
    else _validlinks.Add (strUri); 
} 
+0

请正确缩进代码! – 2010-06-10 14:55:56

+0

@Anthony:大声笑 - 纠正(抱歉宠坏你的笑话)。 – 2010-06-10 15:01:45

回答

1

您应该添加一个UserAgent标头,因为许多网站都需要它们。

+0

我想添加什么样的useragent? – 2010-06-10 15:20:12

+0

这取决于你。它应该可能包含您的联系信息。 – SLaks 2010-06-10 15:27:21

+0

没有解决它。这是我得到错误的页面之一: http://www.sisweb.com/ – 2010-06-10 15:36:29