XML解析器剪切包含口音的字符串

我尝试解析XML文件，但是当我的文件中有重音（é，à，...）时，php xml解析器会剪切字符串。XML解析器剪切包含口音的字符串

function __construct(){ 
     $this->xml_parser = xml_parser_create("UTF-8"); 
     xml_set_object ( $this->xml_parser, $this); 
     xml_set_element_handler($this->xml_parser, "startTagArticle", "endTagArticle"); 
     xml_set_character_data_handler($this->xml_parser, "contentsArticle"); 
     }

如果我的文件包含以下字符串：cccccékkkkkkéllllllll，它会显示为在网页浏览器ékkkkkélllllll，我不知道为什么。

<?xml version="1.0" encoding="utf-8"?> 
<XML> 
     <TITRE>cccccékkkkkkéllllllll</TITRE> 
     <RESUME>Ceci est le premieré article de blog et l'aut</RESUME> 
     <CONTENT>Ceci l'aut est effectivement mon premier article de blog 
     et c'est un test 
     </CONTENT> 
     <FILE_COMMENTS>com1.xml</FILE_COMMENTS> 
     <VISIBLE>true</VISIBLE> 
     <TAG>Cool</TAG> 
     <TAG>article</TAG> 
</XML>

基本解析功能：

function startTagArticle($parser, $data){   switch ($data){    case "RESUME": 
        $this->articleSection = 1; 
        break;   case "CONTENT": 
        $this->articleSection = 2; 
        break;   case "FILE_COMMENTS": 
        $this->articleSection = 3; 
        break;   case "VISIBLE": 
        $this->articleSection = 4; 
         break;   case "TITRE": 
        $this->articleSection = 5; 
         break;   case "TAG": 
        $this->articleSection = 6; 
         break;   default: 
        $this->articleSection = 0; 
         break;  } } 


/** Do not work **/ 
function contentsArticle($parser, $data){ 
     if ($this->articleSection == 1){ 
      $this->resumeArticleCourant = $data; 
     } 
     if ($this->articleSection == 2){ 
      $this->contentArticleCourrant = $data; 
     } 
     if ($this->articleSection == 3){ 
      $this->fichier_comArticleCourant = $data; 
      $this->comm = new commentaire(); 
      $this->comm->init($this->comm_rep.$data); 
     } 
     if ($this->articleSection == 4){ 
      $this->visibleArticleCourant = $data; 
     } 
     if ($this->articleSection == 5){ 
      $this->titreArticleCourant = $data; 
     } 
     if ($this->articleSection == 6){ 
      array_push($this->tag_array,$data); 
     } 
    }

奇怪的是，如果我用我在那里所取代=以下contentsArticle功能=，它工作正常。重音字符似乎切断/停止XML流。

/** work **/ 
function contentsArticle($parser, $data){ 
     if ($this->articleSection == 1){ 
      $this->resumeArticleCourant .= $data; 
     } 
     if ($this->articleSection == 2){ 
      $this->contentArticleCourrant .= $data; 
     } 
     if ($this->articleSection == 3){ 
      $this->fichier_comArticleCourant = $data; 
      $this->comm = new commentaire(); 
      $this->comm->init($this->comm_rep.$data); 
     } 
     if ($this->articleSection == 4){ 
      $this->visibleArticleCourant = $data; 
    } 
     } 
     if ($this->articleSection == 5){ 
      $this->titreArticleCourant .= $data; 
     } 
     if ($this->articleSection == 6){ 
      array_push($this->tag_array,$data); 
     } 
    }

来源

2011-12-24 psic

你发送正确的头到浏览器？例如：header（“Content-Type：text/xml; charset = utf-8”）;如果您在浏览器中查看源代码，您是否看到完整的字符串或剪切的字符串？如果你看到了剪切的问题，问题在于解析，否则，这可能是一个编码问题。 – Yaniro 2011-12-24 19:40:33

如果我直接在浏览器中打开我的XML文件，我会看到完整的字符串......所以我猜，这更像是一个解析问题。 – psic 2011-12-24 21:39:20

同样的行为，并用相同的诊断解决。我已将'='更改为'。='，并且该单词未被分割。不知道为什么。 – Sebastian 2012-11-02 22:07:46

试试;

 
//add 
xml_parser_set_option($xml_parser,XML_OPTION_TARGET_ENCODING, "ISO-8859-1"). 

//and the encoding is 
<?xml version="1.0" encoding="utf-8"?>

希望它可以帮助

来源

2011-12-25 01:16:34

谢谢，但它不起作用。它只是用替换“é”字符。 – psic 2011-12-25 01:44:42

是你的XML编码utf-8 ..？ – 2011-12-25 01:46:14

我相信。但如何确定呢？ – psic 2011-12-25 01:47:35

XML解析器剪切包含口音的字符串

回答

相关问题