2011-06-08 71 views
2

可能重复:
How do I remove diacritics (accents) from a string in .NET?
How to change diacritic characters to non-diacritic onesC#从字符中删除口音?

我怎么能转换áa在C#中?

例如:aéíúö =>aeiuo

嗯,看了这些线程[我不知道他们被称为diatrics,所以我可以为无法搜索。

我想“滴”的所有diatrics但ñ

目前我有:

public static string RemoveDiacritics(this string text) 
{ 
    string normalized = text.Normalize(NormalizationForm.FormD); 
    var sb = new StringBuilder(); 

    foreach (char c in from c in normalized 
         let u = CharUnicodeInfo.GetUnicodeCategory(c) 
         where u != UnicodeCategory.NonSpacingMark 
         select c) 
    { 
     sb.Append(c); 
    } 

    return sb.ToString().Normalize(NormalizationForm.FormC); 
} 

什么会留下ñ出的最好的方法?

我的解决办法是做的foreach后执行以下操作:

var result = sb.ToString(); 

if (text.Length != result.Length) 
    throw new ArgumentOutOfRangeException(); 

int position = -1; 
while ((position = text.IndexOf('ñ', position + 1)) > 0) 
{ 
    result = result.Remove(position, 1).Insert(position, "ñ"); 
} 

return sb.ToString(); 

但是我认为还有一个不那么“手动”的方式来做到这一点?

+3

看到这个职位:http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-网络中的重音符号 – keyboardP 2011-06-08 23:01:40

+0

它取决于底层的代码点。 http://unicode.org/faq/char_combmark.html – Tim 2011-06-08 23:03:18

回答

1

如果你不想删除ñ,这是一个选项。它很快。

static string[] pats3 = { "é", "É", "á", "Á", "í", "Í", "ó", "Ó", "ú", "Ú" }; 
    static string[] repl3 = { "e", "E", "a", "A", "i", "I", "o", "O", "u", "U" }; 
    static Dictionary<string, string> _var = null; 
    static Dictionary<string, string> dict 
    { 
     get 
     { 
      if (_var == null) 
      { 
       _var = pats3.Zip(repl3, (k, v) => new { Key = k, Value = v }).ToDictionary(o => o.Key, o => o.Value); 
      } 

      return _var; 
     } 
    } 
    private static string RemoveAccent(string text) 
    { 
     // using Zip as a shortcut, otherwise setup dictionary differently as others have shown 
     //var dict = pats3.Zip(repl3, (k, v) => new { Key = k, Value = v }).ToDictionary(o => o.Key, o => o.Value); 

     //string input = "åÅæÆäÄöÖøØèÈàÀìÌõÕïÏ"; 
     string pattern = String.Join("|", dict.Keys.Select(k => k)); // use ToArray() for .NET 3.5 
     string result = Regex.Replace(text, pattern, m => dict[m.Value]); 

     //Console.WriteLine("Pattern: " + pattern); 
     //Console.WriteLine("Input: " + text); 
     //Console.WriteLine("Result: " + result); 

     return result; 
    } 

如果你想去除n,更快的选择是: Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(text));