wchar_t *在MSVC中使用UTF8字符

我试图使用vsnprintf使用UTF-8字符格式化wchar_t*，然后使用printf打印缓冲区。wchar_t *在MSVC中使用UTF8字符

考虑下面的代码：

/* 
    This code is modified version of KB sample: 
    https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/rtref/vsnprintf.htm 

    The usage of `setlocale` is required by my real-world scenario, 
    but can be modified if that fixes the issue. 
*/ 

#include <wchar.h> 
#include <stdarg.h> 
#include <stdio.h> 
#include <locale.h> 

#ifdef MSVC 
#include <windows.h> 
#endif 

void vout(char *string, char *fmt, ...) 
{ 
    setlocale(LC_CTYPE, "en_US.UTF-8"); 
    va_list arg_ptr; 

    va_start(arg_ptr, fmt); 
    vsnprintf(string, 100, fmt, arg_ptr); 
    va_end(arg_ptr); 
} 

int main(void) 
{ 
    setlocale(LC_ALL, ""); 
#ifdef MSVC 
    SetConsoleOutputCP(65001); // with or without; no dice 
#endif 

    char string[100]; 

    wchar_t arr[] = { 0x0119 }; 
    vout(string, "%ls", arr); 
    printf("This string should have 'ę' (e with ogonek/tail) after colon: %s\n", string); 
    return 0; 
}

我在Ubuntu 16 GCC编译V5.4以获得所需的输出BASH：

gcc test.c -o test_vsn 
./test_vsn 
This string should have 'ę' (e with ogonek/tail) after colon: ę

然而，在Windows 10 CL V19。 10.25019（VS 2017），我得到了CMD怪异输出：

cl test.c /Fetest_vsn /utf-8 
.\test_vsn 
This string should have 'T' (e with ogonek/tail) after colon: e

（在ę冒号之前BECO MES T和冒号后是不e反尾形符）

注意，我使用的CL的新/utf-8开关（VS 2015），这显然具有带或不带没有影响引入的。根据他们的blog post：

还有一个/ UTF-8选项用于设置的代名词“/源字符集：UTF-8”和“/运行，字符集：UTF-8”。

（我的源文件已经有BOM/utf8'ness和执行，字符集显然没有帮助）

什么可以修改代码/编译器开关的最小量，以使输出看起来相同到gcc的？

来源

2017-08-01 vulcan raven

在Windows上，'printf（）'（和通常的控制台）不支持UTF-8。您可以使用'WideCharToMultiByte（）'（或等价的）将UTF-16编码的'wchar_t'数据转换为UTF-8，但这仍不能保证控制台能够正确显示它。您应该使用Unicode控制台API将Unicode数据写入控制台，例如C++中的Win32'WriteConsoleW（）'函数或'std :: wcout'。关于如何将Unicode数据输出到Windows控制台，StackOverflow存在很多问题。你的声誉足够高，你应该知道在问之前做一些研究。 –

您也可以运行PowerShell IDE并导航到您的程序目录，然后运行您的程序。 –

@RemyLebeau，谢谢。我将尝试'WideCharToMultiByte（）'和其他Unicode控制台API。我做了一些研究，但在产品版本中迷失了方向（例如自VS2015以来，包含vsnprintf OOTB等）。将阅读更多。 :) –

基于@RemyLebeau的评论，我修改了代码以使用printf API的w变体，以获得与Windows上的msvc相同的输出，并与Unix上的gcc相匹配。

此外，我没有改变代码页，现在我用_setmode（FILE翻译模式）。

/* 
    This code is modified version of KB sample: 
    https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/rtref/vsnprintf.htm 

    The usage of `setlocale` is required by my real-world scenario, 
    but can be modified if that fixes the issue. 
*/ 

#include <wchar.h> 
#include <stdarg.h> 
#include <stdio.h> 
#include <locale.h> 

#ifdef _WIN32 
#include <io.h> //for _setmode 
#include <fcntl.h> //for _O_U16TEXT 
#endif 

void vout(wchar_t *string, wchar_t *fmt, ...) 
{ 
    setlocale(LC_CTYPE, "en_US.UTF-8"); 
    va_list arg_ptr; 

    va_start(arg_ptr, fmt); 
    vswprintf(string, 100, fmt, arg_ptr); 
    va_end(arg_ptr); 
} 

int main(void) 
{ 
    setlocale(LC_ALL, ""); 
#ifdef _WIN32 
    int oldmode = _setmode(_fileno(stdout), _O_U16TEXT); 
#endif 

    wchar_t string[100]; 

    wchar_t arr[] = { 0x0119, L'\0' }; 
    vout(string, L"%ls", arr); 
    wprintf(L"This string should have 'ę' (e with ogonek/tail) after colon: %ls\r\n", string); 

#ifdef _WIN32 
    _setmode(_fileno(stdout), oldmode); 
#endif 
    return 0; 
}

或者，我们可以使用fwprintf并提供stdout作为第一个参数。要执行与fwprintf(stderr,format,args)（或perror(format, args)）相同的操作，我们还需要_setmodestderr。

来源

2017-08-02 12:47:22

wchar_t *在MSVC中使用UTF8字符

回答

相关问题