在c中捕获链接的rel类型和href＃

我有一个字符串应该包含表单中的项目列表{0}，{1}和{2}是字符串，我想基本上提取它们。在c中捕获链接的rel类型和href＃

我确实希望这样做的一部分的HTML解析问题，我听说用正则表达式解析HTML是不好的。（像here）

我甚至不知道如何用正则表达式来做到这一点。

这是据我得到

string format = "<link rel=\".*\" type=\".*\" href=\".*\">"; 
Regex reg = new Regex(format); 
MatchCollection matches = reg.Matches(input, 0); 
foreach (Match match in matches) 
{ 
     string rel = string.Empty; 
     string type = string.Empty; 
     string href = string.Empty; 
     //not sure what to do here to get these values for each from the match 
}

我的研究转向了以前，我可能完全使用正则表达式在错误的轨道上。

你会如何用我选择的方法或HTML解析器来做到这一点？

来源

2009-06-18 James W

你会更好使用真正的HTML解析器像在HTML敏捷性包。你可以得到它here。

不使用正则表达式进行HTML解析的主要原因是它可能不是格式正确（几乎总是这样），这可能会破坏正则表达式解析器。

然后，您将使用XPath获取所需的节点并将它们加载到变量中。

HtmlDocument htmlDoc = new HtmlDocument(); 
htmlDoc.LoadHtml(pageMarkup); 
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//link"); 
string rel; 

if(nodes[0].Attributes["rel"] != null) 
{ 
    rel = nodes[0].Attributes["rel"]; 
}

来源

2009-06-18 19:12:46

谢谢。我给你的复选标记是因为你的答案有帮助的代码，并且你解释了为什么要使用解析器而不是正则表达式。感谢Rony为HTML链接提供的链接，我刚下载了它。 – 2009-06-18 19:29:51

解析您HTML中使用HTML敏捷包库，它可以发现here

来源

2009-06-18 18:59:31 Rony

感谢您的链接。 – 2009-06-18 19:34:01

在c中捕获链接的rel类型和href＃

回答

相关问题