我想弄清楚在C++中处理unicode的正确方法。我想了解g ++如何处理文字宽字符字符串和包含unicode字符的常规c字符串。我已经设置了一些基本的测试,并不真正了解发生了什么。C++和g ++如何处理unicode?
wstring ws1(L"«¬.txt"); // these first 2 characters correspond to 0xAB, 0xAC
string s1("«¬.txt");
ifstream in_file(s1.c_str());
// wifstream in_file(s1.c_str()); // this throws an exception when I
// call in_file >> s;
string s;
in_file >> s; // s now contains «¬
wstring ws = textToWide(s);
wcout << ws << endl; // these two lines work independently of each other,
// but combining them makes the second one print incorrectly
cout << s << endl;
printf("%s", s.c_str()); // same case here, these work independently of one another,
// but calling one after the other makes the second call
// print incorrectly
wprintf(L"%s", ws.c_str());
wstring textToWide(string s)
{
mbstate_t mbstate;
char *cc = new char[s.length() + 1];
strcpy(cc, s.c_str());
cc[s.length()] = 0;
size_t numbytes = mbsrtowcs(0, (const char **)&cc, 0, &mbstate);
wchar_t *buff = new wchar_t[numbytes + 1];
mbsrtowcs(buff, (const char **)&cc, numbytes + 1, &mbstate);
wstring ws = buff;
delete [] cc;
delete [] buff;
return ws;
}
好像调用wcout和wprintf腐败流莫名其妙,而且它始终是安全的,只要字符串编码为UTF-8调用的cout和printf。
处理unicode的最佳方法是在处理之前将所有输入转换为宽,并且在发送到outupt之前将所有输出转换为utf-8?
你可能会对[UTF8 everywhere]感兴趣(http://www.utf8everywhere.org/)。 – Angew
Yeap。 http://utf8everywhere.org –