2012-04-29 117 views
3

我试图按字符拆分字符串字符,但是我遇到了特殊字符的问题。 我目前使用以下功能:如何按字符拆分字符串,注意特殊字符

<?php 
$input = "Comment ça va?"; 
$array_input = str_split($input, 1); 
print_r($array_input); 
?> 

下面是输出:

Array (
[0] => C [1] => o [2] => m [3] => m [4] => e 
[5] => n [6] => t [7] => [8] => � [9] => � 
[10] => a [11] => [12] => v [13] => a [14] => ?) 

我已经换行了同样的问题:

输入:
“他!
Oui?“

输出:

Array ([0] => H [1] => � [2] => � [3] => ! [4] => 
[5] => [6] => O [7] => u [8] => i [9] => ?) 

是否有人有针对此问题的解决方案? 非常感谢。

回答

3

str_split Unicode字符串有问题。

可以使用u修饰符preg_split代替

例如:

$input = "Comment ça va?"; 
$letters1 = str_split($input); 
$letters2 = preg_split('//u', $input, -1, PREG_SPLIT_NO_EMPTY); 

print_r($letters1); 
print_r($letters2); 

将输出

Array ([0] => C [1] => o [2] => m [3] => m [4] => e 
     [5] => n [6] => t [7] => [8] => � [9] => � 
     [10] => a [11] => [12] => v [13] => a [14] => ?) 

Array ([0] => C [1] => o [2] => m [3] => m [4] => e 
     [5] => n [6] => t [7] => [8] => ç [9] => a 
     [10] => [11] => v [12] => a [13] => ?) 
+0

谢谢您的回答。它适用于特殊字符,但不适用于换行符: 输入:hé! oui?数组([0] => h [1] => [2] => [3] => [4] => [5] => o [6] => u [7] =>我[8] =>?) – Zorkzyd 2012-04-29 14:51:33

+1

@Zorkzyd:它实际上是在工作:位置3和4分别是\ r和\ n ...(尝试'ord($ letters [3])''''ord($字母[4])',你将分别得到13和10,这是'\ r'和'\ n'的ASCII码。 – nico 2012-04-29 14:58:30

+0

谢谢你的解释。是否有可能在输出的数组中“合并”\ r \ n? – Zorkzyd 2012-04-29 15:03:43

2

这是因为PHP的str_split功能并不多字节安全的,即它无法正确处理Unicode。您可以使用此功能来代替,这是str_split

function mb_str_split($string) { 
    # Split at all position not after the start:^
    # and not before the end: $ 
    return preg_split('/(?<!^)(?!$)/u', $string); 
} 

多字节安全的实现(来源:网友评论在PHP documentation

+0

谢谢大安,但尼科的答案似乎更容易:) – Zorkzyd 2012-04-29 15:05:55

+0

不客气!祝你好运 :) – Daan 2012-04-29 15:32:59