问题 - 仅在文档开始时允许XML声明

xml：19558：解析器错误：只允许在文档开始时使用XML声明问题 - 仅在文档开始时允许XML声明

任何解决方案？我使用php XMLReader来解析一个大的XML文件，但得到这个错误。我知道文件格式不正确，但我认为它不可能通过该文件并删除这些额外的声明。所以任何想法，请帮助

2011-03-29 Aamir

如果格式不正确，则不是XML。如果它不是XML，那么XMLReader不会很好地播放。 – drudge 2011-03-29 22:11:07

该文件的唯一问题是多个声明:( （<？xml version =“1.0”encoding =“UTF-8”standalone =“no”？>）反正出？ – Aamir 2011-03-29 22:16:38

需要删除空格！如何识别和修复这样的错误： https://www.youtube.com/watch?v=4jWhO07ICvw – 2016-11-28 13:44:15

请确保在第一个标记之前没有任何空格。试试这个：

<?php 
//Declarations 
$file = "data.txt"; //The file to read from. 

#Read the file 
$fp = fopen($file, "r"); //Open the file 
$data = ""; //Initialize variable to contain the file's content 
while(!feof($fp)) //Loop through the file, read it till the end. 
{ 
    $data .= fgets($fp, 1024); //append next kb to data 
} 
fclose($fp); //Close file 
#End read file 
$split = preg_split('/(?<=<\/xml>)(?!$)/', $data); //Split each xml occurence into its own string 

foreach ($split as $sxml) //Loop through each xml string 
{ 
    //echo $sxml; 
    $reader = new XMLReader(); //Initialize the reader 
    $reader->xml($sxml) or die("File not found"); //open the current xml string 
    while($reader->read()) //Read it 
    { 
     switch($reader->nodeType) 
     { 
      case constant('XMLREADER::ELEMENT'): //Read element 
       if ($reader->name == 'record') 
       { 
        $dataa = $reader->readInnerXml(); //get contents for <record> tag. 
        echo $dataa; //Print it to screen. 
       } 
      break; 
     } 
    } 
    $reader->close(); //close reader 
} 
?>

设置$文件变量所需的文件。注意我不知道这对于4GB文件有效。告诉我如果没有。

编辑：这是另一种解决方案，它应该更好地处理较大的文件（解析它在读取文件时）。

<?php 
set_time_limit(0); 
//Declarations 
$file = "data.txt"; //The file to read from. 

#Read the file 
$fp = fopen($file, "r") or die("Couldn't Open"); //Open the file 

$FoundXmlTagStep = 0; 
$FoundEndXMLTagStep = 0; 
$curXML = ""; 
$firstXMLTagRead = false; 
while(!feof($fp)) //Loop through the file, read it till the end. 
{ 
    $data = fgets($fp, 2); 
    if ($FoundXmlTagStep==0 && $data == "<") 
     $FoundXmlTagStep=1; 
    else if ($FoundXmlTagStep==1 && $data == "x") 
     $FoundXmlTagStep=2; 
    else if ($FoundXmlTagStep==2 && $data == "m") 
     $FoundXmlTagStep=3; 
    else if ($FoundXmlTagStep==3 && $data == "l") 
    { 
     $FoundXmlTagStep=4; 
     $firstXMLTagRead = true; 
    } 
    else if ($FoundXmlTagStep!=4) 
     $FoundXmlTagStep=0; 

    if ($FoundXmlTagStep==4) 
    { 
     if ($firstXMLTagRead) 
     { 
      $firstXMLTagRead = false; 
      $curXML = "<xm"; 
     } 
     $curXML .= $data; 

     //Start trying to match end of xml 
     if ($FoundEndXMLTagStep==0 && $data == "<") 
      $FoundEndXMLTagStep=1; 
     elseif ($FoundEndXMLTagStep==1 && $data == "/") 
      $FoundEndXMLTagStep=2; 
     elseif ($FoundEndXMLTagStep==2 && $data == "x") 
      $FoundEndXMLTagStep=3; 
     elseif ($FoundEndXMLTagStep==3 && $data == "m") 
      $FoundEndXMLTagStep=4; 
     elseif ($FoundEndXMLTagStep==4 && $data == "l") 
      $FoundEndXMLTagStep=5; 
     elseif ($FoundEndXMLTagStep==5 && $data == ">") 
     { 
      $FoundEndXMLTagStep=0; 
      $FoundXmlTagStep=0; 
      #finished Reading XML 
      ParseXML ($curXML); 
     } 
     elseif ($FoundEndXMLTagStep!=5) 
      $FoundEndXMLTagStep=0; 
    } 
} 
fclose($fp); //Close file 
function ParseXML ($xml) 
{ 
    //echo $sxml; 
    $reader = new XMLReader(); //Initialize the reader 
    $reader->xml($xml) or die("File not found"); //open the current xml string 
    while($reader->read()) //Read it 
    { 
     switch($reader->nodeType) 
     { 
      case constant('XMLREADER::ELEMENT'): //Read element 
       if ($reader->name == 'record') 
       { 
        $dataa = $reader->readInnerXml(); //get contents for <record> tag. 
        echo $dataa; //Print it to screen. 
       } 
      break; 
     } 
    } 
    $reader->close(); //close reader 
} 
?>

来源

2011-03-29 22:10:24 Ben

没有亲爱的不是这种情况。actaully this line（<？xml version =“1.0”encoding =“UTF-8 “standalone =”no“？>）在文件中被重复多次..这就是错误报告所说的。 – Aamir 2011-03-29 22:12:14

你有 Ben 2011-03-29 22:13:56

yes，但它现在是多次，如何解决这个问题？有些东西就像删除这些额外的标签，但如何？ – Aamir 2011-03-29 22:17:37

如果有多个XML声明，你可能有很多的XML文件的串联，也不止一个根元素。目前还不清楚你将如何有意义地解析它们。

尽量让XML的源头先给你真正的XML。如果这不起作用，请在分析之前查看是否可以执行一些预处理来修复XML。

来源

2011-03-29 22:13:25

hmm ..请你让我知道如何删除这些额外的声明？任何简单的PHP代码？其实我对这一切都很陌生，只是呆在这里。 – Aamir 2011-03-29 22:15:24

我知道你的意思是...！尽量让XML的源头先给你真正的XML。 – Aamir 2011-03-29 22:15:42

你从哪里得到XML？您能否与负责生成XML的负责人交谈，因为这是不正确的，应该予以纠正。为了修复XML，请查看PHP字符串替换。 – 2011-03-29 22:16:39

此问题的另一个可能的原因是unicode文件头。如果您的XML编码为UTF-8，则文件内容始终以这3个字节“EF BB BF”开头。如果尝试从字节数组转换为字符串，这些字节可能会被错误地解释。解决方案是直接将字节数组写入文件，而无需从字节数组中读取getString。

ASCII没有文件头的Unicode：FF FE UTF-8：EF BB BF UTF-32：FF FE 00 00

只要打开在UltraEdit的文件，你可以看到这些字节。

来源

2014-03-31 18:36:42 kaven

问题 - 仅在文档​​开始时允许XML声明

回答

相关问题

问题 - 仅在文档开始时允许XML声明