在HtmlAgilityPack，Xpath中使用谓词

我想从网站获取数据。我正在使用HtmlAgilityPack（C＃）。在网站内容是这样的在HtmlAgilityPack，Xpath中使用谓词

<div id="list"> 
    <div class="list1"> 
    <a href="example1.com" class="href1" >A1</a> 
    <a href="example4.com" class="href2" /> 
    </div> 
    <div class="list2"> 
    <a href="example2.com" class="href1" >A2</a> 
    <a href="example5.com" class="href2" /> 
    </div> 
    <div class="list3"> 
    <a href="example3.com" class="href1" >A3</a> 
    <a href="example6.com" class="href2" /> 
    </div> 
    <div class="list3"> 
    <a href="example4.com" class="href1" >A4</a> 
    <a href="example6.com" class="href2" /> 
    </div> 
    <div class="list3"> 
    <a href="example5.com" class="href1" >A5</a> 
    <a href="example6.com" class="href2" /> 
    </div><div class="list3"> 
    <a href="example6.com" class="href1" >A6</a> 
    <a href="example6.com" class="href2" /> 
    </div><div class="list3"> 
    <a href="example3.com" class="href1" >A7</a> 
    <a href="example6.com" class="href2" /> 
    </div> 
</div>

在这里，我们有7类链接class =“href1”。我只想抓取3个链接（从第3个链接到第5个链接）。如何获取这些特定的链接？

来源

2012-02-16 Sagar Kadam

您的数据已经看起来是格式良好的XML。如果你正在解析XHTML页面，那么你可能会逃避.NET Framework的System.Xml类。例如，将数据加载到XElement，你可以使用：

XElement xElement = XElement.Parse(@" 
    <div id=""list""> 
     <div class=""list1""> 
      <a href=""example1.com"" class=""href1"" >A1</a> 
      <a href=""example4.com"" class=""href2"" /> 
     </div> 
     <div class=""list2""> 
      <a href=""example2.com"" class=""href1"" >A2</a> 
      <a href=""example5.com"" class=""href2"" /> 
     </div> 
     <div class=""list3""> 
      <a href=""example3.com"" class=""href1"" >A3</a> 
      <a href=""example6.com"" class=""href2"" /> 
     </div> 
     <div class=""list3""> 
      <a href=""example4.com"" class=""href1"" >A4</a> 
      <a href=""example6.com"" class=""href2"" /> 
     </div> 
     <div class=""list3""> 
      <a href=""example5.com"" class=""href1"" >A5</a> 
      <a href=""example6.com"" class=""href2"" /> 
     </div> 
     <div class=""list3""> 
      <a href=""example6.com"" class=""href1"" >A6</a> 
      <a href=""example6.com"" class=""href2"" /> 
     </div> 
     <div class=""list3""> 
      <a href=""example3.com"" class=""href1"" >A7</a> 
      <a href=""example6.com"" class=""href2"" /> 
     </div> 
    </div>");

然后，选择第三至第五<a>元素，其class属性有href1值，使用：

var links = xElement.XPathSelectElements("//a[@class='href1']").Skip(2).Take(3).ToList();

另一方面，如果您有HtmlAgilityPack.HtmlDocument实例，则可以使用以下命令执行XPath查询：

HtmlNodeCollection links = htmlDoc.DocumentNode.SelectNodes("//a[@class='href1']"); 
var links3to5 = links.Cast<HtmlNode>().Skip(2).Take(3).ToList();

来源

2012-02-16 19:10:50 Douglas

这种代码：

HtmlDocument doc = new HtmlDocument(); 
    doc.Load(myHtmlFile); 
    foreach (HtmlNode node in doc.DocumentNode.SelectNodes(
     "//div[@class='list3' and position() > 2 and position() < 6]/a[@class='href1']")) 
    { 
     Console.WriteLine("node:" + node.InnerText); 
    }

会给你这样的结果：

node:A3 
node:A4 
node:A5

来源

2012-02-16 19:21:29

非常感谢你.. – 2012-02-16 20:25:24

在HtmlAgilityPack，Xpath中使用谓词

回答

相关问题