HTML敏捷从段落标记中获取文本

我试图在Windows Phone 8.1应用程序中使用htmlagilitypack 2.28获取div中段落标记的文本。HTML敏捷从段落标记中获取文本

DIV的结构是

<div id="55"> 

<p>&nbsp;</p> 

<p><span class="dropcap">W 

</span><span class="zw-portion"><strong>ith the start of festive season in India</strong>, we 
will also witness the f<strong>irst London Derby</strong> of the season  
between the newly London rivals <strong>Chelsea and Arsenal</strong>. It will be a great chance 
for Arsene Wenger to get rid of his <strong>1000</strong></span> 

<strong><span class="zw-portion">th</span><span class="zw-portion"> managed </span> 

<span class="zw-portion">6-0 </spa> 

<span class="zw-portion">massacre</span></strong> 

<span class="zw-portion"> in March,</span> 

<span class="zw-portion">&nbsp;</span> 

<span class="zw-portion">while the Special One will be eager to continue his winning rampage 
</span> 

<span class="zw-portion">&nbsp;</span> 

<span class="zw- portion">over his “<strong>Specialist in Failure</strong>” counterpart. Although 
both clubs can boast of being unbeaten this season and both clubs can take this opportunity 
</span> 

<span class="zw-portion"> to bring down their rival</span><span class="zw-portion">.</span></p> 

<p>&nbsp;</p> 

<p><iframe width="640" height="360" src="https://www.youtube.com/embed/zFBN8M1pCxo? 
feature=oembed" frameborder="0" allowfullscreen=""></iframe></p> 

<p class="zw-paragraph" data-textformat=" 
{&quot;type&quot;:&quot;text&quot;,&quot;td&quot;:&quot;none&quot;}"></p> 

<p class="zw-paragraph" data-textformat= 
{&quot;type&quot;:&quot;text&quot;,&quot;td&quot;:&quot;none&quot;}"> 

<span class="zw-portion">The rivalry between Chelsea and Arsenal was not as a primary London 
Derby, until Chelsea rose to top of Premier League in 2000’s, when they consistently competed 
against each other. The rivalry between the two clubs rose higher as compared to their 
traditional rivals. Both the clubs rivalry are now not only limited to their pitch but has also 
been to the fans. In 2009 survey by Football Fans Census, Arsenal fans named Chelsea as the 

<strong>most disliked club</strong> </span> 

<span class="zw-portion"> ahead of their traditional rivals <strong>Manchest</strong></span> 
<strong> <span class="zw-portion">er United and Tottenham Hotspur</span></strong> 

<span class="zw-portion">. However the report of the other camp doesn’t differ much as Chelsea 
fans ranks Arsenal as their <strong>second most-disliked club</strong></span> 

<strong><span class="zw-portion">. 
</span></strong></p> 
</div>

我想只提取DIV内的段落元素内containined文本。到目前为止，我已经写了以下代码，其中feedurl包含要从中提取数据的页面地址（提取正确的地址）。之后，我尝试使用它的id（总是等于55）来获得对div的引用。

var feedurl = GetValue("feedurl"); 
string htmlPage = "asdsad"; 
HtmlDocument htmldoc = new HtmlDocument(); 
htmldoc.LoadHtml(feedurl); 
htmldoc.OptionUseIdAttribute=true; 
HtmlNode div = htmldoc.GetElementbyId("55"); 
if (div != null) 
{ 
    htmlPage += "done"; 
} 

_content = htmlPage; 
return _content;

htmldoc.GetElementbyId("55");正在返回空引用。我已阅读并使用htmldoc.DocumentNode.SelectNodes([arguments])。但没有SelectNodes方法可供我使用。我迷失在如何进一步发展。请帮忙。

来源

2014-10-05 user3263192

WP 8.1的HtmlAgilityPack版本不支持SelectNodes()，因为该方法需要XPath实现，这在WP8.1的.NET版本中不幸丢失。

解决方案是使用HtmlAgilityPack的LINQ API而不是Xpath。例如，为了获得具有id属性等于55<div>元素：

HtmlNode div55 = htmldoc.DocumentNode 
         .Descendants("div") 
         .FirstOrDefault(o => o.GetAttributeValue("id", "") 
                == "55");

来源

2014-10-06 00:35:43 har07

我将不得不使用system.link吧。如果在此之后，我使用 'if（div55！= null）{做点什么}'它没有做任何事情。如果我使用'div55.InnerText'，我会得到NullReference异常。 – user3263192 2014-10-06 09:14:28

确保您正确地将HTML加载到'HtmlDocument'（您可以通过'DocumentNode.OuterHtml'属性进行检查，看该属性是否包含预期的HTML标记） – har07 2014-10-06 10:28:58

'DocumentNode.OuterHtml'返回存储在页面中的页面url feedurl变量。这是对的吗。原谅我这么愚蠢的问题，因为我是新手，无法在网上找到答案。 – user3263192 2014-10-06 10:46:43

HTML敏捷从段落标记中获取文本

回答

相关问题