2010-01-13 79 views
5

我将html存储在我的c#.net 2.0代码中的字符串变量中。下面是一个例子:从html中输出前两个段落存储为字符串

<div class="track"> 
    <img alt="" src="http://hits.guardian.co.uk/b/ss/guardiangu-feeds/1/H.20.3/30561?ns=guardian&pageName=Hundreds+feared+dead+in+Haiti+quake%3AArticle%3A1336252&ch=World+news&c3=GU.co.uk&c4=Haiti+%28News%29%2CDominican+Republic+%28News%29%2CCuba+%28News%29%2CBahamas+%28News%29%2CNatural+disasters+and+extreme+weather+%28News%29%2CEnvironment%2CWorld+news&c6=Rory+Carroll%2CHaroon+Siddique&c7=10-Jan-13&c8=1336252&c9=Article&c10=News&c11=World+news&c13=&c25=&c30=content&h2=GU%2FWorld+news%2FHaiti" width="1" height="1" /> 
</div> 
<p class="standfirst"> 
    • Tens of thousands lose homes in 7.0 magnitude quake<br /> 
    • UN headquarters, schools and hospitals collapse 
</p> 
<p> 
    René Préval, the president of Haiti, has described the devastation after last night's earthquake as "unimaginable" as governments and aid agencies around the world rushed into action. 
</p> 
<p> 
    Préval described how he had been forced to step over dead bodies and heard the cries of those trapped under the rubble of the national parliament. "Parliament has collapsed. The tax office has collapsed. Schools have collapsed. Hospitals have collapsed," <a href="http://www.miamiherald.com/582/story/1422279.html" title="he told the Miami Herald">he told the Miami Herald</a>. "There are a lot of schools that have a lot of dead people in them." Préval said he thought thousands of people had died in the quake. 
</p> 

我只想输出前两个段落作为原始的子字符串。

有人可以帮忙吗?

回答

4

我到底使用此功能...

private string GetFirstParagraph(string htmltext) 
     { 
      Match m = Regex.Match(htmltext, @"<p>\s*(.+?)\s*</p>"); 
      if (m.Success) 
      { 
       return m.Groups[1].Value; 
      } 
      else 
      { 
       return htmltext; 
      } 
     } 
0

您使用的是JavaScript么?你可以在p标签上使用爆炸来获得数组中一个部分的div + first para,以及每个p标签在它们各自的元素中。

0

您可以编写几个方法将HTML加载到webbrowser变量中,然后使用DOM遍历节点并提取您想要的任何自定义逻辑。看看这tutorial

下面是如何创建的代码webbroswer后面,而不是教程如何告诉你如何做一个片段:

using System.Windows.Forms; 

WebBrowser _Browser = null; 
string _Source = "Your HTML goes here"; 

_Browser = new WebBrowser(); 
_Browser.Navigate("about:Blank"); 
_Browser.Document.OpenNew(true); 
_Browser.Document.Write(_Source); 
3

看一看的Html Agility Pack

它公开了一个非常强大的API解析HTML,可以用来提取你想要的数据。

+0

+1感谢您的链接 – 2010-01-13 18:37:40

相关问题