正则表达式 - 拿到冠军

的特定部分，我已经有了一个称号的结构是这样的：正则表达式 - 拿到冠军

<title>WebsiteName | Page title | Slogan</title>

目前，在C＃中我用这个，拿到冠军了：

Regex.Match(pageSource, 
       @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>", 
       RegexOptions.IgnoreCase).Groups["Title"].Value;

但是，我想出去的只是网页标题。

来源

2013-05-08 ItsGreg

是，HTML你解析？ – Anirudha 2013-05-08 17:46:55

你想要在你提供的标题中匹配什么？只是'页面标题'？ – 2013-05-08 17:51:21

分解你的问题。使用DOM解析工具的som形式来解析html。请参阅下面的答案。然后在标题内容中使用正则表达式或简单的字符串。 – Mithon 2013-05-08 18:00:27

试试这个：

@"\<title[^>]*\>[^|]*\|\s*(?<Title>[^|]*?)\|[^<]*\</title\>" 

"\<title[^>]*\>" //Title tag 
"[^|]*"   //Everything up to the first pipe 
"\|\s*"   //First pipe and any leading white space 
"(?<Title>[^|]*?)" //The page title section between the pipes 
"\|"    //Second pipe 
"[^<]*\"   //Everything after the first pipe up to closing title tag 
"</title\>"  //closing title tag

来源

2013-05-08 17:54:42 Cemafor

工作就像一个魅力！谢谢：） – ItsGreg 2013-05-11 17:33:33

如果你只是想获得Page Title那就试试这个：

\|(.*)\|

你的第二场比赛将包含标题，如果你通过你提供的字符串。如果你发现自己做了比这更复杂的事情，那么正则表达式可能不是你的工具。有更好的方法来解析HTML。

来源

2013-05-08 17:47:52

避免使用regex解析html。

则可以使用htmlAgilityPack

这将让HTML的标题呢！

HtmlDocument doc = new HtmlDocument(); 
doc.Load(yourStream);  
string title=doc.DocumentNode.SelectSingleNode("//title").InnerText;

现在越来越页面标题你可以用这个表达式

考虑您的标题一定是相同的形式在你的例子给出获取所需的数据后，就可以使用

(?<=\|).+?(?=\|)

来源

2013-05-08 17:49:20 Anirudha

我认为他想要在标题标签内使用“页面标题”？这并不完全清楚... – 2013-05-08 17:53:40

@AbeMiessler很好catch..would编辑答案..感谢 – Anirudha 2013-05-08 17:58:07

我的第一个想法是使用HAP，但决定不会导致我认为它会更慢.. – ItsGreg 2013-05-08 18:32:40

正则表达式 - 拿到冠军

回答

相关问题