1
我最近发现<codecvt>
标题,所以我想在UTF-8和UTF-16之间转换。错误endian与wstring_convert
我使用C++ 11中的wstring_convert
和codecvt_utf8_utf16
构面。 我遇到的问题是,当我尝试将UTF-16字符串转换为UTF-8时,再次在UTF-16中,字节顺序发生了变化。
对于这个代码:
#include <codecvt>
#include <string>
#include <locale>
#include <iostream>
using namespace std;
int main(int argc, char const *argv[])
{
wstring_convert<codecvt_utf8_utf16<char16_t>, char16_t>
convert;
u16string utf16 = u"\ub098\ub294\ud0dc\uc624";
cout << hex << "UTF-16\n\n";
for (char16_t c : utf16)
cout << "[" << c << "] ";
string utf8 = convert.to_bytes(utf16);
cout << "\n\nUTF-16 to UTF-8\n\n";
for (unsigned char c : utf8)
cout << "[" << int(c) << "] ";
cout << "\n\nConverting back to UTF-16\n\n";
utf16 = convert.from_bytes(utf8);
for (char16_t c : utf16)
cout << "[" << c << "] ";
cout << endl;
}
我得到这样的输出:
UTF-16
[B098] [B294] [d0dc] [C624]
UTF -16至UTF-8
[EB] [82] [98] [EB] [8A] [94 ] [ED] [83] [图9C] [EC] [98] [A4]
转换回UTF-16
[98b0] [94b2] [DCD0] [24c6]
当我将wstring_convert
的第三个模板参数更改为std::little_endian
时,字节被反转。
我错过了什么?
无法重现:http://coliru.stacked-crooked.com/a/5599be701f3ebb32 – Cubbi
感谢您的答复,这是奇怪的,我m使用gcc 5,我会尝试从今晚的资源中编译它,看看我是否得到相同的行为。 – Dante
将编译器切换到gcc也不会在coliru上重现此问题:http://coliru.stacked-crooked.com/a/cbac3e56d8f55c30 – Cubbi