2011-10-10 126 views
0

我想知道为什么我像hey hey %80分贝插入串'%80'仍然产生eception时有麻烦:蒙戈DB UTF-8例外

Uncaught exception 'MongoException' with message 'non-utf8 string: hey hey �' 

什么,我需要做什么? :(是%80不是UTF-8; CHAR:2 O

JS的字符串传递给控制器​​:

function new_pool_post(_url,_data,_starter){ 
$.ajax({ 
    type:'POST', 
    data:_data, 
    dataType:'json', 
    url:_url, 
    beforeSend:function(){ 
    $('.ajax-loading').show(); 
    $(_starter).attr('disabled','disabled'); 
    }, 
    error:function(){ 
     $('.ajax-loading').hide(); 
     $(_starter).removeAttr('disabled'); 
    }, 
    success:function(json){ 
    $('.ajax-loading').hide(); 
    $(_starter).removeAttr('disabled'); 
    if(json){ 
     $('.pool-append').prepend(json.pool_post); 

    } 
    } 
}); 
} 

控制器接收数据:

$id_project = $this->input->post('id_project',true); 
       $id_user = $this->session->userdata('user_id'); 
       $pool_post = $this->input->post('pool_post',true); 

控制器消毒数据:

public function xss_clean($str, $is_image = FALSE) 
    { 
     /* 
     * Is the string an array? 
     * 
     */ 
     if (is_array($str)) 
     { 
      while (list($key) = each($str)) 
      { 
       $str[$key] = $this->xss_clean($str[$key]); 
      } 

      return $str; 
     } 
       /*Remove non utf-8; chars*/ 

       $str = htmlspecialchars(urlencode(preg_replace('/[\x00-\x1F\x80-\xFF]/','',$str))); 

     /* 
     * Remove Invisible Characters 
     */ 
     $str = remove_invisible_characters($str); 

     // Validate Entities in URLs 
     $str = $this->_validate_entities($str); 

     /* 
     * URL Decode 
     * 
     * Just in case stuff like this is submitted: 
     * 
     * <a href="http://%77%77%77%2E%67%6F%6F%67%6C%65%2E%63%6F%6D">Google</a> 
     * 
     * Note: Use rawurldecode() so it does not remove plus signs 
     * 
     */ 
     $str = rawurldecode($str); 

     /* 
     * Convert character entities to ASCII 
     * 
     * This permits our tests below to work reliably. 
     * We only convert entities that are within tags since 
     * these are the ones that will pose security problems. 
     * 
     */ 

     $str = preg_replace_callback("/[a-z]+=([\'\"]).*?\\1/si", array($this, '_convert_attribute'), $str); 

     $str = preg_replace_callback("/<\w+.*?(?=>|<|$)/si", array($this, '_decode_entity'), $str); 

     /* 
     * Remove Invisible Characters Again! 
     */ 
     $str = remove_invisible_characters($str); 

     /* 
     * Convert all tabs to spaces 
     * 
     * This prevents strings like this: ja vascript 
     * NOTE: we deal with spaces between characters later. 
     * NOTE: preg_replace was found to be amazingly slow here on 
     * large blocks of data, so we use str_replace. 
     */ 

     if (strpos($str, "\t") !== FALSE) 
     { 
      $str = str_replace("\t", ' ', $str); 
     } 

     /* 
     * Capture converted string for later comparison 
     */ 
     $converted_string = $str; 

     // Remove Strings that are never allowed 
     $str = $this->_do_never_allowed($str); 

     /* 
     * Makes PHP tags safe 
     * 
     * Note: XML tags are inadvertently replaced too: 
     * 
     * <?xml 
     * 
     * But it doesn't seem to pose a problem. 
     */ 
     if ($is_image === TRUE) 
     { 
      // Images have a tendency to have the PHP short opening and 
      // closing tags every so often so we skip those and only 
      // do the long opening tags. 
      $str = preg_replace('/<\?(php)/i', "&lt;?\\1", $str); 
     } 
     else 
     { 
      $str = str_replace(array('<?', '?'.'>'), array('&lt;?', '?&gt;'), $str); 
     } 

     /* 
     * Compact any exploded words 
     * 
     * This corrects words like: j a v a s c r i p t 
     * These words are compacted back to their correct state. 
     */ 
     $words = array(
       'javascript', 'expression', 'vbscript', 'script', 
       'applet', 'alert', 'document', 'write', 'cookie', 'window' 
      ); 

     foreach ($words as $word) 
     { 
      $temp = ''; 

      for ($i = 0, $wordlen = strlen($word); $i < $wordlen; $i++) 
      { 
       $temp .= substr($word, $i, 1)."\s*"; 
      } 

      // We only want to do this when it is followed by a non-word character 
      // That way valid stuff like "dealer to" does not become "dealerto" 
      $str = preg_replace_callback('#('.substr($temp, 0, -3).')(\W)#is', array($this, '_compact_exploded_words'), $str); 
     } 

     /* 
     * Remove disallowed Javascript in links or img tags 
     * We used to do some version comparisons and use of stripos for PHP5, 
     * but it is dog slow compared to these simplified non-capturing 
     * preg_match(), especially if the pattern exists in the string 
     */ 
     do 
     { 
      $original = $str; 

      if (preg_match("/<a/i", $str)) 
      { 
       $str = preg_replace_callback("#<a\s+([^>]*?)(>|$)#si", array($this, '_js_link_removal'), $str); 
      } 

      if (preg_match("/<img/i", $str)) 
      { 
       $str = preg_replace_callback("#<img\s+([^>]*?)(\s?/?>|$)#si", array($this, '_js_img_removal'), $str); 
      } 

      if (preg_match("/script/i", $str) OR preg_match("/xss/i", $str)) 
      { 
       $str = preg_replace("#<(/*)(script|xss)(.*?)\>#si", '[removed]', $str); 
      } 
     } 
     while($original != $str); 

     unset($original); 

     // Remove evil attributes such as style, onclick and xmlns 
     $str = $this->_remove_evil_attributes($str, $is_image); 

     /* 
     * Sanitize naughty HTML elements 
     * 
     * If a tag containing any of the words in the list 
     * below is found, the tag gets converted to entities. 
     * 
     * So this: <blink> 
     * Becomes: &lt;blink&gt; 
     */ 
     $naughty = 'alert|applet|audio|basefont|base|behavior|bgsound|blink|body|embed|expression|form|frameset|frame|head|html|ilayer|iframe|input|isindex|layer|link|meta|object|plaintext|style|script|textarea|title|video|xml|xss'; 
     $str = preg_replace_callback('#<(/*\s*)('.$naughty.')([^><]*)([><]*)#is', array($this, '_sanitize_naughty_html'), $str); 

     /* 
     * Sanitize naughty scripting elements 
     * 
     * Similar to above, only instead of looking for 
     * tags it looks for PHP and JavaScript commands 
     * that are disallowed. Rather than removing the 
     * code, it simply converts the parenthesis to entities 
     * rendering the code un-executable. 
     * 
     * For example: eval('some code') 
     * Becomes:  eval&#40;'some code'&#41; 
     */ 
     $str = preg_replace('#(alert|cmd|passthru|eval|exec|expression|system|fopen|fsockopen|file|file_get_contents|readfile|unlink)(\s*)\((.*?)\)#si', "\\1\\2&#40;\\3&#41;", $str); 


     // Final clean up 
     // This adds a bit of extra precaution in case 
     // something got through the above filters 
     $str = $this->_do_never_allowed($str); 

     /* 
     * Images are Handled in a Special Way 
     * - Essentially, we want to know that after all of the character 
     * conversion is done whether any unwanted, likely XSS, code was found. 
     * If not, we return TRUE, as the image is clean. 
     * However, if the string post-conversion does not matched the 
     * string post-removal of XSS, then it fails, as there was unwanted XSS 
     * code found and removed/changed during processing. 
     */ 

     if ($is_image === TRUE) 
     { 
      return ($str == $converted_string) ? TRUE: FALSE; 
     } 

     log_message('debug', "XSS Filtering completed"); 
     return $str; 
    } 

控制器将清除过的数据传给mongo db中的插入模型和模型: nothin g以下... :)

+1

即使您通过uri请求发送查询并且没有正确编码,'%80'的计算结果为ASCII'P'。请发布一些完整的片段。 –

+0

我使用codeigniter php框架并通过POST方法中的XHR请求传递字符串 – sbaaaang

+1

使用urlencode()这是好的 – sbaaaang

回答

2

我为相关的问题

EQ

ucfirst UTF-8需要使用mb_ucfirst( '直升机', 'utf-8');

,我认为在你的情况的问题是:SUBSTR需要使用mb_substr

其他:

就这么meybe在开始的iconv转换为ISO-8859-1和写入数据库图标到t UTF-8

+0

嗯认为不明白...:P你可以exaplain更好?你有我的同样的问题? – sbaaaang

+1

取代$ temp。= substr($ word,$ i,1)。“\ s *”;到mb_substr – user956584

+0

嗯完成但没有任何改变 – sbaaaang

-1

为了防止这个问题,你可以使用

header("Content-Type: text/html; charset=UTF-8"); 

在php文件的顶部。
this stackoverflow post中找到了解决方案,并在使用拉丁特殊字符将MySQL DB迁移到MongoDB时为我工作。