使用XPATH获取HTML标记属性与HTML敏捷包

 
META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" /> 
TITLE>Microsoft Corporation 
META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" /> 
META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" /> 
META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." /> 
META NAME="MS.LOCALE" CONTENT="EN-US" /> 
META NAME="CATEGORY" CONTENT="home page" />

我想知道什么XPATH我需要使用HTML敏捷包获取Category元标记的Content属性的值。（我删除了html代码中每行的第一个<，所以它会发布）。使用XPATH获取HTML标记属性与HTML敏捷包

来源

2010-07-12 Eugene

很长一段时间HtmlAgilityPack didn't had the ability to directly query an attribute value。您必须遍历元节点列表。这里有一种方法 -

var doc = new HtmlDocument(); 
doc.LoadHtml(htmlString); 

var list = doc.DocumentNode.SelectNodes("//meta"); 
foreach (var node in list) 
{ 
    string content = node.GetAttributeValue("content", ""); 
}

但看起来像有一个experimental xpath release，可以让你做到这一点。

doc.Document.SelectNodes("//meta/@content")

将返回一个HtmlAttribute对象的列表。

来源

2010-07-12 21:41:34

感谢您的快速反应Rohit Agarwal（我看到它在我问了几个小时后才回答，但直到今天才能测试）。

我本来实现你的建议如下（这是在vb.net）

Dim result As String = webClient.DownloadString(url) Dim doc As New HtmlDocument() doc.LoadHtml(result)

 Dim list = doc.DocumentNode.SelectNodes("//meta") 
    Dim node As Object 

    For Each node In list 
     Dim metaname As String = node.GetAttributeValue("name", String.Empty) 
     If metaname <> String.Empty Then 
      If (metaname = "title") Then 
       title = node.GetAttributeValue("content", String.Empty) 
      //more elseif thens 
      End if 
     End if 
    Next (node)

然而，我发现，//元[@名称=“标题”]给我的同样的结果

Dim result As String = webClient.DownloadString(url)

 Dim doc As New HtmlDocument() doc.LoadHtml(result)

title = doc.DocumentNode.SelectNodes("//meta[@name='title']")(0).GetAttributeValue("content", String.Empty)

谢谢你把我在正确的轨道上= d

来源

2010-07-14 20:54:38 Eugene

其实，稍微好一点的办法是使用 title = doc.DocumentNode.SelectSingleNode("//meta[@name='title']").GetAttributeValue("content", String.Empty) – Eugene 2010-07-14 21:09:23

或者更好的是标题= doc.DocumentNode.SelectSingleNode（ “//元[@名称= '标题']/@内容”） – Eugene 2010-07-14 21:15:30

的上面一个title = doc.DocumentNode.SelectSingleNode（“// meta [@ name ='title']/@ content”）。ToString不起作用... – Eugene 2010-07-14 21:21:38

如果你只想meta标记显示标题，描述和关键字，然后使用

if (metaTags != null) 
     { 
      foreach (var tag in metaTags) 
      { 
       if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null)) 
       { 
         Panel divPage = new Panel();       
         divPage.InnerHtml = divPage.InnerHtml + "<br /> " + 
         "<b> Page " + tag.Attributes["name"].Value + " </b>: " + 
          tag.Attributes["content"].Value + "<br />"; 
       } 
      } 
     }

如果你想从该链接og:tags后

  if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null)) 
      { 
       if (tag.Attributes["property"].Value == "og:image") 
       { 
        img.ImageUrl = tag.Attributes["content"].Value; 
       } 

      }

，这是很好的经验。我喜欢添加以下代码：）这个代码永远

来源

2015-07-23 08:49:16

由于没有错误检查：

doc.DocumentNode.SelectSingleNode("//meta[@name='description']").Attributes["content"].Value;

的C如果节点是空的，或者如果内容属性不存在，则会产生问题。

来源

2017-10-24 06:47:39

使用XPATH获取HTML标记属性与HTML敏捷包

回答

相关问题