函数返回1个UTF-8字符？

我有前进1 UTF-8字符和返回的字节数花了到那里的函数：函数返回1个UTF-8字符？

// Moves the iterator to next unicode character in the string, 
//returns number of bytes skipped 
template<typename _Iterator1, typename _Iterator2> 
inline size_t bringToNextUnichar(_Iterator1& it, 
    const _Iterator2& last) const { 
    if(it == last) return 0; 
    unsigned char c; 
    size_t res = 1; 
    for(++it; last != it; ++it, ++res) { 
     c = *it; 
     if(!(c&0x80) || ((c&0xC0) == 0xC0)) break; 
    } 

    return res; 
}

我怎么能修改，这样我可以从任意回去Unicode字符字符？

谢谢

来源

2011-02-11 jmasterx

只是递减迭代器而不是增加它。

// Moves the iterator to previous unicode character in the string, 
//returns number of bytes skipped 
template<typename _Iterator1, typename _Iterator2> 
inline size_t bringToPrevUnichar(_Iterator1& it, 
    const _Iterator2& first) const { 
    if(it == first) return 0; 
    unsigned char c; 
    size_t res = 1; 
    for(--it; first != it; --it, ++res) { // Note: --it instead of ++it 
     c = *it; 
     if(!(c&0x80) || ((c&0xC0) == 0xC0)) break; 
    } 

    return res; 
}

来源

2011-02-11 01:28:50 Maz

Utf8可能需要超过1个字符。 – 2011-02-11 01:33:30

UTF-8开始字节是无论0xxxxxxx或11xxxxxx。 UTF-8流中没有其他字节匹配这些字节。从这里你可以设计一个功能boolean isStartByte(unsigned char c)。从那里剩下的工作与C++迭代器无关。玩的开心。

来源

2011-02-11 01:34:14 rlibby

在UTF-8，有三种字节的...

0xxxxxxx : ASCII 
10xxxxxx : 2nd, 3rd or 4th byte of code 
11xxxxxx : 1st byte of multibyte code

于是后退一步，直到你找到一个0xxxxxxx或11xxxxxx字节。

来源

2011-02-11 01:35:30 Steve314

函数返回1个UTF-8字符？

回答

相关问题