2014-09-26 58 views
0

我试图创建一个需要网页内容的wp 8.1应用程序。我的问题是xpath似乎不适用于WP8.1,所以我试图使用LinQ,但我不太了解它。 该网页是这样的:使用HAP LinQ解析网页

<body> 
    <table cellspacing="0" cellpadding="0" border="0" style="border-style:none; padding:0; margin:0;" id="ctl00_ContentPlaceHolder1_ListView1_groupPlaceholderContainer">    
     <tbody> 
      <tr style="border-style:none;padding:0; margin:0; background-image:none; vertical-align:top;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_itemPlaceholderContainer">   
       <td style="border-style:none;padding:0; margin:0; width:22%;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_ctl01_Td3"> 
        <div class="photo"> 
         <a target="_self" title="PH1" href="fumetto.aspx?Fumetto=279277">PH1_1</a> 
        </div> 
       </td> 
      </tr> 
      <tr style="border-style:none;padding:0; margin:0; background-image:none; vertical-align:top;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_itemPlaceholderContainer">   
       <td style="border-style:none;padding:0; margin:0; width:22%;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_ctl01_Td3"> 
        <div class="photo"> 
         <a target="_self" title="PH2" href="fumetto.aspx?Fumetto=279277">PH2_1</a> 
        </div> 
       </td> 
      </tr> 
      <tr style="border-style:none;padding:0; margin:0; background-image:none; vertical-align:top;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_itemPlaceholderContainer">   
       <td style="border-style:none;padding:0; margin:0; width:22%;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_ctl01_Td3"> 
        <div class="photo"> 
         <a target="_self" title="PH3" href="fumetto.aspx?Fumetto=279277">PH3_1</a> 
        </div> 
       </td> 
      </tr> 
     </tbody> 
    </table> 
</body> 

我要保存属性 “PH1”, “PH2”, “PH3” 和值 “PH1_1”, “PH2_1”, “PH3_1”。你可以帮我吗?我的代码是这样的:

string filePath = "..."; 
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); 
htmlDoc.OptionFixNestedTags = true; 
htmlDoc.LoadHtml(filePath); 
if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count() > 0) 
{ 
    // Handle any parse errors as required 
} 
else 
{ 
    if (htmlDoc.DocumentNode != null) 
    { 
     //I'm trying to get the first node for now 
     HtmlAgilityPack.HtmlNode aNode = htmlDoc.DocumentNode.DescendantsAndSelf("a").FirstOrDefault(); 
     if (aNode != null) 
     { 
      string first = aNode.GetAttributeValue("title", "null"); 
      string value = aNode.ToString(); 
      ... 
     } 
    } 
} 
+0

你现在的代码有什么问题? – har07 2014-09-26 13:45:26

+0

问题是'first'是'“null”,'value'是''htmlAgilitypack'''。 – 2014-09-26 13:47:57

回答

1

尝试与Descendants()取代DescendantsAndSelf()

HtmlAgilityPack.HtmlNode aNode = htmlDoc.DocumentNode 
             .Descendants("a") 
             .FirstOrDefault(); 

而不是调用ToString(),用InnerText属性来获取开cloaing标签之间的文本:

if (aNode != null) 
{ 
    string first = aNode.GetAttributeValue("title", "null"); 
    string value = aNode.InnerText; 
    ..... 
} 

[.NET fiddle demo]

+0

'aNode'现在是'null' ..如果我保留'DescendantsAndSelf()',我会为'first'和'value'页面的链接取'null'。 – 2014-09-26 13:53:43

+0

不确定'DescendantsAndSelf()',它有点为我返回错误的元素(也许是HAP中的一个错误,没有进一步检查)。但'后裔()'应该工作,看演示[在dotnetfiddle](https://dotnetfiddle.net/61fc10) – har07 2014-09-26 14:00:21

+0

另一个问题。我如何获得html节点,如果我有这样的事情?我总是得到第一个节点,而不是第二个节点。 '<!DOCTYPE HTML> ...' – 2014-09-26 16:28:00