c＃替换自定义标记

我有一个文本编辑器，类似于堆栈溢出时使用的文本编辑器。我正在处理c＃中的文本字符串，但也允许用户使用自定义标签来格式化文本。例如..c＃替换自定义标记

<year /> will output the current year. 
"Hello <year /> World" would render Hello 2012 World

我想要做的是创造一个正则表达式来搜索<year />任何一次出现字符串和替换它。除此之外，我还想给标签添加属性，并能够提取它们，使其如此<year offset="2" format="5" />。我对RegEx不太了解，但希望有人知道如何做到这一点？

感谢

来源

2011-09-21 tmutton

是您的文件*实际* XML？这将使它更容易... –

你需要逃避角色，你的标记没有通过。 – Hammerstein

这只是一个C＃字符串。 – tmutton

理想情况下，你不应该使用正则表达式;但看到Html敏捷包没有HtmlReader我想你必须。

这就是说，看着其他标记解决方案，他们经常使用正则表达式模式和相关替换列表 - 所以我们不应该写一个'一般'的情况下（例如<([A-Z][A-Z0-9]*)>.*?</\1>将是在这里做错了什么，相反，我们想要<year>.*?</year>）。

起初你可能会创建一个类来保存与识别的标记信息，例如：

public class Token 
{ 
    private Dictionary<string, string> _attributes = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase); 
    public string InnerText { get; private set; } 

    public string this[string attributeName] 
    { 
     get 
     { 
      string val; 
      _attributes.TryGetValue(attributeName, out val); 
      return val; 
     } 
    } 

    public Token(string innerText, IEnumerable<KeyValuePair<string, string>> values) 
    { 
     InnerText = innerText; 
     foreach (var item in values) 
     { 
      _attributes.Add(item.Key, item.Value); 
     } 
    } 

    public int GetInteger(string name, int defaultValue) 
    { 
     string val; 
     int result; 
     if (_attributes.TryGetValue(name, out val) && int.TryParse(val, out result)) 
      return result; 
     return defaultValue; 
    } 
}

现在，我们需要创建的正则表达式。例如，一个正则表达式匹配您的year元素看起来像：

<Year(?>\s*(?<aname>\w*?)\s*=\s*"(?<aval>[^"]*)"\s*)*>(?<itext>.*?)</Year>

因此，我们可以推广为：

<{0}\s*(?>(?<aname>\w*?)\s*=\s*"(?<aval>[^"]*)"\s*)*>(?<itext>.*?)</{0}> 
<{0}\s*(?>(?<aname>\w*?)\s*=\s*"(?<aval>[^"]*)"\s*)*/>

鉴于这些普通标签的正则表达式，我们可以写标记类：

public class MyMarkup 
{ 
    // These are used to build up the regex. 
    const string RegexInnerText = @"<{0}\s*(?>(?<aname>\w*?)\s*=\s*""(?<aval>[^""]*)""\s*)*>(?<itext>.*?)</{0}>"; 
    const string RegexNoInnerText = @"<{0}\s*(?>(?<aname>\w*?)\s*=\s*""(?<aval>[^""]*)""\s*)*/>"; 

    private static LinkedList<Tuple<Regex, MatchEvaluator>> _replacers = new LinkedList<Tuple<Regex, MatchEvaluator>>(); 

    static MyMarkup() 
    { 
     Register("year", false, tok => 
     { 
      var count = tok.GetInteger("digits", 4); 
      var yr = DateTime.Now.Year.ToString(); 
      if (yr.Length > count) 
       yr = yr.Substring(yr.Length - count); 
      return yr; 
     }); 
    } 

    private static void Register(string tagName, bool supportsInnerText, Func<Token, string> replacement) 
    { 
     var eval = CreateEvaluator(replacement); 

     // Add the no inner text variant. 
     _replacers.AddLast(Tuple.Create(CreateRegex(tagName, RegexNoInnerText), eval)); 
     // Add the inner text variant. 
     if (supportsInnerText) 
      _replacers.AddLast(Tuple.Create(CreateRegex(tagName, RegexInnerText), eval)); 
    } 

    private static Regex CreateRegex(string tagName, string format) 
    { 
     return new Regex(string.Format(format, Regex.Escape(tagName)), RegexOptions.Compiled | RegexOptions.IgnoreCase); 
    } 

    public static string Execute(string input) 
    { 
     foreach (var replacer in _replacers) 
      input = replacer.Item1.Replace(input, replacer.Item2); 
     return input; 
    } 

    private static MatchEvaluator CreateEvaluator(Func<Token, string> replacement) 
    { 
     return match => 
     { 
      // Grab the groups/values. 
      var aname = match.Groups["aname"]; 
      var aval = match.Groups["aval"]; 
      var itext = match.Groups["itext"].Value; 

      // Turn aname and aval into a KeyValuePair. 
      var attrs = Enumerable.Range(0, aname.Captures.Count) 
       .Select(i => new KeyValuePair<string, string>(aname.Captures[i].Value, aval.Captures[i].Value)); 

      return replacement(new Token(itext, attrs)); 
     }; 
    } 
}

这些都是非常艰苦的工作，但它应该给你一个你应该做什么的好主意。

来源

2011-09-21 12:32:16

string.Replace就足够了第一个要求 - 无需使用正则表达式。

string.Replace(myString, "<year />", @"<year offset=""2"" /">")

为了提取属性值 - 你可以split上"：

var val = @"<year offset=""2"" /">".Split('"')[1];

更新（以下评论）：

您可以尝试使用Html Agility Pack解析和操纵文字。它在HTML片段上运行良好 - 虽然我不确定它将如何处理自定义标签（值得一试），但它的运行良好并且形式不正常。它可能虽然是矫枉过正。

来源

2011-09-21 10:56:06 Oded

我不认为他需要用另一个语法替换一个语法，但要处理它们（提取第二个语法中的参数，如果存在的话）:) – Marco

是的，我需要提取属性值 – tmutton

我需要添加多于一个属性的标签也。 – tmutton

c＃替换自定义标记

回答

相关问题