C++和g ++如何处理unicode？

我想弄清楚在C++中处理unicode的正确方法。我想了解g ++如何处理文字宽字符字符串和包含unicode字符的常规c字符串。我已经设置了一些基本的测试，并不真正了解发生了什么。C++和g ++如何处理unicode？

wstring ws1(L"«¬.txt"); // these first 2 characters correspond to 0xAB, 0xAC 
string s1("«¬.txt"); 

ifstream in_file(s1.c_str()); 
// wifstream in_file(s1.c_str()); // this throws an exception when I 
            // call in_file >> s; 
string s; 
in_file >> s; // s now contains «¬ 

wstring ws = textToWide(s); 

wcout << ws << endl; // these two lines work independently of each other, 
        // but combining them makes the second one print incorrectly 
cout << s << endl; 
printf("%s", s.c_str()); // same case here, these work independently of one another, 
          // but calling one after the other makes the second call 
          // print incorrectly 
wprintf(L"%s", ws.c_str()); 

wstring textToWide(string s) 
{ 
    mbstate_t mbstate; 
    char *cc = new char[s.length() + 1]; 
    strcpy(cc, s.c_str()); 
    cc[s.length()] = 0; 
    size_t numbytes = mbsrtowcs(0, (const char **)&cc, 0, &mbstate); 
    wchar_t *buff = new wchar_t[numbytes + 1]; 
    mbsrtowcs(buff, (const char **)&cc, numbytes + 1, &mbstate); 
    wstring ws = buff; 
    delete [] cc; 
    delete [] buff; 
    return ws; 
}

好像调用wcout和wprintf腐败流莫名其妙，而且它始终是安全的，只要字符串编码为UTF-8调用的cout和printf。

处理unicode的最佳方法是在处理之前将所有输入转换为宽，并且在发送到outupt之前将所有输出转换为utf-8？

来源

2013-08-19 Brian Schlenker

你可能会对[UTF8 everywhere]感兴趣（http://www.utf8everywhere.org/）。 – Angew

Yeap。 http://utf8everywhere.org –

处理Unicode最全面的方法是使用Unicode库，如ICU。与一堆编码相比，Unicode有更多的方面。 C++不提供API来处理这些额外的方面。 ICU的确如此。

如果你只想处理编码，那么一种有效的方法是正确使用内置的C++方法。这包括呼叫

std::setlocale(LC_ALL, 
       /*some system-specific locale name, probably */ "en_US.UTF-8")

在程序的开始。另外，在同一程序中不使用cout/printf和wcout/wprintf。（您可以在同一个程序中使用标准句柄以外的常规和宽流对象）。

将所有输入转换为宽，并将所有输出转换为utf-8是一种合理的策略。使用utf-8也是合理的。很多取决于你的应用程序。 C++ 11内置了UTF8，UTF16和UTF32字符串类型，可以在一定程度上简化任务。

无论你做什么，都不要在字符串文字中使用扩展字符集的元素。（在C++ 11中，可以在UTF8/16/32字符串文字中使用它们）。

来源

2013-08-19 19:04:27

C++和g ++如何处理unicode？

回答

相关问题