如何从网页中的不完整网址中构建完整的网址？

我可以检索网页的文字，让我们说https://stackoverflow.com/questions有一些真正的和由链接：如何从网页中的不完整网址中构建完整的网址？

 
    /questions 
    /tags 
    /questions?sort=votes 
    /questions?sort=active 
    randompage.aspx 
    ../coolhomepage.aspx

知道我的原始页面被https://stackoverflow.com/questions有.NET中的方式来解决的链接呢？

 
    https://stackoverflow.com/questions 
    https://stackoverflow.com/tags 
    https://stackoverflow.com/questions?sort=votes 
    https://stackoverflow.com/questions?sort=active 
    https://stackoverflow.com/questions/randompage.aspx 
    https://stackoverflow.com/coolhomepage.aspx

有点像浏览器的智能足以解析链接的方式。

===========================更新 - 使用大卫的解决方案：

 
    'Regex to match all <a ... /a> links 
    Dim myRegEx As New Regex("\<\s*a     (?# Find opening <a tag)   " & _ 
          ".+?href\s*=\s*['""]  (?# Then all to href=' or "")  " & _ 
          "(?<href>.*?)['""]  (?# Then all to the next ' or "") " & _ 
          ".*?\>     (?# Then all to >)    " & _ 
          "(?<name>.*?)\<\s*/a\s*\> (?# Then all to </a>)    ", _ 
          RegexOptions.IgnoreCase Or _ 
          RegexOptions.IgnorePatternWhitespace Or _ 
          RegexOptions.Multiline) 

    'MatchCollection to hold all the links that are matched 
    Dim myMatchCollection As MatchCollection 
    myMatchCollection = myRegEx.Matches(Me._RawPageText) 

    'Loop through all matches and evaluate the value of the href attribute. 
    For i As Integer = 0 To myMatchCollection.Count - 1 
     Dim thisLink As String = "" 
     thisLink = myMatchCollection(i).Groups("href").Value() 
     'This checks for Javascript and Mailto links. 
     'This is not complete. There are others to check I just haven't encountered them yet. 
     If thisLink.ToLower.StartsWith("javascript") Then 
      thisLink = "JAVASCRIPT: " & thisLink 
     ElseIf thisLink.ToLower.StartsWith("mailto") Then 
      thisLink = "MAILTO: " & thisLink 
     Else 
      Dim baseUri As New Uri(Me.URL) 

      If Not thisLink.ToLower.StartsWith("http") Then 
       'This is a partial URL so we will assume that it's relative to our originating URL 
       Dim myUri As New Uri(baseUri, thisLink) 
       thisLink = "RELATIVE LOCAL LINK: RESOLVED: " & myUri.ToString() & " ORIGINAL: " & thisLink 
      Else 
       'The link starts with HTTP, determine if part of base host or is outside host. 
       Dim ThisUri As New Uri(thisLink) 
       If ThisUri.Host.ToLower = baseUri.Host.ToLower Then 
        thisLink = "INSIDE COMPLETE LINK: " & thisLink 
       Else 
        thisLink = "OUTSIDE LINK: " & thisLink 
       End If 
      End If 

     End If 

     'I'm storing the found links into a Generic.List(Of String) 
     'This link has descriptive text added to it. 
     'TODO: Make collection to hold only unique internal links. 
     Me._Links.Add(thisLink) 
    Next

来源

2009-05-05 rvarcher

您的意思是这样的？

Uri baseUri = new Uri("http://www.contoso.com"); 
Uri myUri = new Uri(baseUri, "catalog/shownew.htm"); 

Console.WriteLine(myUri.ToString());

样品来自http://msdn.microsoft.com/en-us/library/9hst1w91.aspx

来源

2009-05-05 22:25:24

是的，这就是我需要。这适用于源自不同位置的URL。我会更新我的问题以显示我是如何实现它的。谢谢！ – rvarcher 2009-05-06 17:34:54

如果你的意思是服务器端，你可以使用ResolveUrl()：

string url = ResolveUrl("~/questions");

来源

2009-05-05 22:19:37

我不明白你在这方面“决心”的意思，但你可以尝试插入一个基本HTML元素。既然你问过浏览器如何处理它。

“<base>标记为页面上的所有链接指定默认地址或默认目标。”

http://www.w3schools.com/TAGS/tag_base.asp

来源

2009-05-05 22:20:52

如何从网页中的不完整网址中构建完整的网址？

回答

相关问题