2011-12-11 32 views
4

我想打印出一串UTF-16字符。我后来发布了这个问题,给出的建议是使用iconv转换为UTF-32并将其打印为wchar_t字符串。如何将UTF-16转换为UTF-32并在C中打印结果wchar_t?

我做了一些研究,并成功地编写以下:

// *c is the pointer to the characters (UTF-16) i'm trying to print 
// sz is the size in bytes of the input i'm trying to print 

iconv_t icv; 
char in_buf[sz]; 
char* in; 
size_t in_sz; 
char out_buf[sz * 2]; 
char* out; 
size_t out_sz; 

icv = iconv_open("UTF-32", "UTF-16"); 

memcpy(in_buf, c, sz); 

in = in_buf; 
in_sz = sz; 
out = out_buf; 
out_sz = sz * 2; 

size_t ret = iconv(icv, &in, &in_sz, &out, &out_sz); 
printf("ret = %d\n", ret); 
printf("*** %ls ***\n", ((wchar_t*) out_buf)); 

的电话的iconv总是返回0,所以我想转换应该是OK?

然而,印刷似乎是碰运气。有时候转换后的wchar_t字符串会打印OK。其他时候,它在打印wchar_t时似乎遇到问题,并且完全终止printf函数调用,使得即使是后面的“***”也不会被打印。

我也使用

wprintf(((wchar_t*) "*** %ls ***\n"), out_buf)); 

尝试,但从来都没有被打印出来。

我在这里错过了什么吗?

参考:How to Print UTF-16 Characters in C?

UPDATE

纳入一些在意见建议。

更新的代码:

// *c is the pointer to the characters (UTF-16) i'm trying to print 
// sz is the size in bytes of the input i'm trying to print 

iconv_t icv; 
char in_buf[sz]; 
char* in; 
size_t in_sz; 
wchar_t out_buf[sz/2]; 
char* out; 
size_t out_sz; 

icv = iconv_open("UTF-32", "UTF-16"); 

memcpy(in_buf, c, sz); 

in = in_buf; 
in_sz = sz; 
out = (char*) out_buf; 
out_sz = sz * 2; 

size_t ret = iconv(icv, &in, &in_sz, &out, &out_sz); 
printf("ret = %d\n", ret); 
printf("*** %ls ***\n", out_buf); 
wprintf(L"*** %ls ***\n", out_buf); 

还是一样的结果,并不是所有的UTF-16字符串得到印刷(两者中的printf和wprintf)。

还有什么可我会丢失?

顺便说一句,我使用的Linux,并已证实为wchar_t是4个字节。

+1

'wprintf()'需要格式字符串具有'L'前缀,例如'wprintf(L“***%ls *** \ n”,out_buf)'。 –

+1

你为什么要将输入复制到本地缓冲区'in_buf'?只需直接使用'c' ... –

+1

你也不能合法地将指向'char'数组的指针指向'wchar_t'指针。输出缓冲区需要有'wchar_t [n]'类型。 –

回答

4

这里是一个短程序,它转换UTF-16为宽字符数组,然后打印出来。

#include <endian.h> 
#include <errno.h> 
#include <iconv.h> 
#include <stdio.h> 
#include <stdlib.h> 
#include <string.h> 
#include <wchar.h> 

#define FROMCODE "UTF-16" 

#if (BYTE_ORDER == LITTLE_ENDIAN) 
#define TOCODE "UTF-32LE" 
#elif (BYTE_ORDER == BIG_ENDIAN) 
#define TOCODE "UTF-32BE" 
#else 
#error Unsupported byte order 
#endif 

int main(void) 
{ 
    void *tmp; 
    char *outbuf; 
    const char *inbuf; 
    long converted = 0; 
    wchar_t *out = NULL; 
    int status = EXIT_SUCCESS, n; 
    size_t inbytesleft, outbytesleft, size; 
    const char in[] = { 
     0xff, 0xfe, 
     'H', 0x0, 
     'e', 0x0, 
     'l', 0x0, 
     'l', 0x0, 
     'o', 0x0, 
     ',', 0x0, 
     ' ', 0x0, 
     'W', 0x0, 
     'o', 0x0, 
     'r', 0x0, 
     'l', 0x0, 
     'd', 0x0, 
     '!', 0x0 
    }; 
    iconv_t cd = iconv_open(TOCODE, FROMCODE); 
    if ((iconv_t)-1 == cd) { 
     if (EINVAL == errno) { 
      fprintf(stderr, "iconv: cannot convert from %s to %s\n", 
        FROMCODE, TOCODE); 
     } else { 
      fprintf(stderr, "iconv: %s\n", strerror(errno)); 
     } 
     goto error; 
    } 
    size = sizeof(in) * sizeof(wchar_t); 
    inbuf = in; 
    inbytesleft = sizeof(in); 
    while (1) { 
     tmp = realloc(out, size + sizeof(wchar_t)); 
     if (!tmp) { 
      fprintf(stderr, "realloc: %s\n", strerror(errno)); 
      goto error; 
     } 
     out = tmp; 
     outbuf = (char *)out + converted; 
     outbytesleft = size - converted; 
     n = iconv(cd, (char **)&inbuf, &inbytesleft, &outbuf, &outbytesleft); 
     if (-1 == n) { 
      if (EINVAL == errno) { 
       /* junk at the end of the buffer, ignore it */ 
       break; 
      } else if (E2BIG != errno) { 
       /* unrecoverable error */ 
       fprintf(stderr, "iconv: %s\n", strerror(errno)); 
       goto error; 
      } 
      /* increase the size of the output buffer */ 
      converted = size - outbytesleft; 
      size <<= 1; 
     } else { 
      /* done */ 
      break; 
     } 
    } 
    converted = (size - outbytesleft)/sizeof(wchar_t); 
    out[converted] = L'\0'; 
    fprintf(stdout, "%ls\n", out); 
    /* flush the iconv buffer */ 
    iconv(cd, NULL, NULL, &outbuf, &outbytesleft); 
exit: 
    if (out) { 
     free(out); 
    } 
    if (cd) { 
     iconv_close(cd); 
    } 
    exit(status); 
error: 
    status = EXIT_FAILURE; 
    goto exit; 
} 

由于UTF-16是一种可变长度编码,因此您猜测您的输出缓冲区需要多大。正确的程序应该处理输出缓冲区不足以容纳转换数据的情况。

你也应该注意到,iconvNULL -terminate你的输出缓冲你。

Iconv是一个面向数据流的处理器,所以如果你想重复使用它进行另一次转换(示例代码在接近尾声时这样做),你需要刷新iconv_t。如果你想做流处理,你会处理EINVAL错误,在再次调用iconv之前,将输入缓冲区中剩余的任何字节复制到新输入缓冲区的开始位置。