2017-02-28 50 views

回答

5

的字节239, 191, 191以UTF-8进行解码,以Unicode的码点U+FFFF

iex(1)> <<x::utf8>> = <<239, 191, 191>> 
<<239, 191, 191>> 
iex(2)> x 
65535 
iex(3)> x == 0xFFFF 
true 

其是Unicode Non-CharacterString.valid?/1 has a list of all such characters并在遇到任何那些的返回false


我找不到任何功能灵药,只有检查UTF-8有效性,并跳过非字符检查,但它是微不足道的写一个:

defmodule A do 
    def valid_utf8?(<<_::utf8, rest::binary>>), do: valid_utf8?(rest) 
    def valid_utf8?(<<>>), do: true 
    def valid_utf8?(_), do: false 
end 

for binary <- [<<0>>, <<239, 191, 191>>, <<128>>] do 
    IO.inspect {binary, String.valid?(binary), A.valid_utf8?(binary)} 
end 

输出:

{<<0>>, true, true} 
{<<239, 191, 191>>, false, true} 
{<<128>>, false, false}