2008-08-09 43 views
2

在阅读其他内容之前,请花些时间阅读original thread如何将xml文件编码为xfdl(base64-gzip)?

概述:.xfdl文件是经过gzip压缩的.xml文件,然后用base64编码。我希望将.xfdl解压缩为xml,然后我可以修改它,然后重新编码回.xfdl文件。

XFDL> xml.gz> XML> xml.gz> XFDL

我已经能够使用uudeview采取.xfdl文件,并取消它编码的base64来自:

uudeview -i yourform.xfdl 

然后使用gunzip解

gunzip -S "" <UNKNOWN.001> yourform-unpacked.xml 

生成的XML是100%可读和看起来精彩decommpressed它。如果没有修改XML的话,我应该能够使用gzip重新压缩它:

gzip yourform-unpacked.xml 

然后重新编码的基础 - 64:

base64 -e yourform-unpacked.xml.gz yourform_reencoded.xfdl 

如果我的想法是正确的,原来的文件并且重新编码的文件应该相等。但是,如果我将yourform.xfdl和yourform_reencoded.xfdl放在一起,那么它们不匹配。此外,可以在http://www.grants.gov/help/download_software.jsp#pureedge">.xfdl查看器中查看原始文件。查看器说重新编码的xfdl不可读。

I也尝试了uuenview在base64中重新编码,它也产生相同的结果。任何帮助将不胜感激

回答

0

gzip算法的不同实现将总是产生稍微不同但仍然正确的文件,也是压缩级别的原始文件可能会有所不同然后你在哪里运行它

2

据我所知,你找不到已压缩文件的压缩级别当你压缩文件时,你可以指定压缩级别为 - #,其中#从1到9(1是最快的压缩,9是压缩最多的文件)。在实践中,你绝对不应该将压缩文件与已经提取并重新压缩的压缩文件进行比较,轻微变化很容易出现。在你的情况下,我会比较base64编码版本而不是gzip版本。

0

有趣的,我会给它一个镜头。然而,变化并不轻微。新编码的文件比较长,在比较前后的二进制文件时,数据几乎没有匹配。

之前(前三行)

H4sIAAAAAAAAC+19eZOiyNb3/34K3r4RT/WEU40ssvTtrhuIuKK44Bo3YoJdFAFZ3D79C6hVVhUq 
dsnUVN/qmIkSOLlwlt/JPCfJ/PGf9dwAlorj6pb58wv0LfcFUEzJknVT+/ml2uXuCSJP3kNf/vOQ 
+TEsFVkgoDfdn18mnmd/B8HVavWt5TsKI2vKN8magyENiH3Lf9kRfpd817PmF+jpiOhQRFZcXTMV 

后(前三行):

H4sICJ/YnEgAAzEyNDQ2LTExNjk2NzUueGZkbC54bWwA7D1pU+JK19/9FV2+H5wpByEhJMRH 
uRUgCMom4DBYt2oqkAZyDQlmQZ1f/3YSNqGzKT3oDH6RdE4vOXuf08vFP88TFcygYSq6dnlM 
naWOAdQGuqxoo8vjSruRyGYzfII6/id3dPGjVKwCBK+Zl8djy5qeJ5NPT09nTduAojyCZwN9 

正如你可以看到H4SI匹配起来,那么,它的混乱之后。

+0

但是,除非您使用完全相同的gzip实现,否则您只能指望H4sI是相同的。 “Pandemonium”是正常的:-) – 2011-03-29 08:04:11

1

你需要把下面一行XFDL文件的开头:

application/vnd.xfdl; content-encoding="base64-gzip"

你已经产生的64位编码的文件后,在文本编辑器打开它并粘贴在第一行上面的线。确保base64的块在第二行开始时启动。

保存并在查看器中试试!如果它仍然无法正常工作,那么对XML所做的更改可能会导致它不符合某种方式。在这种情况下,在修改了XML之后,在对它进行gzip和base64编码之前,请使用.xfdl文件扩展名保存它,并尝试使用Viewer工具打开它。如果查看器处于有效的XFDL格式,那么查看器应该能够解析并显示未压缩/未编码的文件。

0

gzip将把文件名放在文件头中,这样一个gzip文件的长度根据未压缩文件的文件名而不同。

如果在流gzip的行为中,省略了文件名和文件是有点更短,所以下面应该工作:

gzip的yourform-unpacked.xml.gz

然后重新编码在BASE64: 的base64 -e yourform-unpacked.xml.gz yourform_reencoded.xfdl

也许这将产生相同长度的文件

1

检查这些了:

http://www.ourada.org/blog/archives/375

http://www.ourada.org/blog/archives/390

他们是在Python,Ruby的不是,但应该让你非常接近。

该算法实际上用于头文件'application/x-xfdl; content-encoding =“asc-gzip”'而不是'application/vnd.xfdl; content-encoding =“base64-gzip”' 但是,好消息是PureEdge(又名IBM Lotus Forms)将会毫无问题地打开该格式。

然后最糟糕的是,这里有一个基于64位的gzip解码(在Python),这样就可以使全往返:

with open(filename, 'r') as f: 
    header = f.readline() 
    if header == 'application/vnd.xfdl; content-encoding="base64-gzip"\n': 
    decoded = b'' 
    for line in f: 
     decoded += base64.b64decode(line.encode("ISO-8859-1")) 
    xml = zlib.decompress(decoded, zlib.MAX_WBITS + 16) 
+0

(这不是我的博客,顺便说一句。)并信贷的MAX_WBITS魔术:http://stackoverflow.com/questions/1838699/how-can-i-decompress-a-gzip-stream -with-的zlib – CrazyPyro 2011-02-16 21:49:59

1

我用的Base64类从帮助做这在Java中http://iharder.net/base64

我一直在研究一个应用程序来在Java中进行表单操作。我解码文件,从XML创建一个DOM文档,然后将其写回文件。

我在Java代码中读取文件看起来是这样的:

public XFDLDocument(String inputFile) 
     throws IOException, 
      ParserConfigurationException, 
      SAXException 

{ 
    fileLocation = inputFile; 

    try{ 

     //create file object 
     File f = new File(inputFile); 
     if(!f.exists()) { 
      throw new IOException("Specified File could not be found!"); 
     } 

     //open file stream from file 
     FileInputStream fis = new FileInputStream(inputFile); 

     //Skip past the MIME header 
     fis.skip(FILE_HEADER_BLOCK.length()); 

     //Decompress from base 64     
     Base64.InputStream bis = new Base64.InputStream(fis, 
       Base64.DECODE); 

     //UnZIP the resulting stream 
     GZIPInputStream gis = new GZIPInputStream(bis); 

     DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); 
     DocumentBuilder db = dbf.newDocumentBuilder(); 
     doc = db.parse(gis); 

     gis.close(); 
     bis.close(); 
     fis.close(); 

    } 
    catch (ParserConfigurationException pce) { 
     throw new ParserConfigurationException("Error parsing XFDL from file."); 
    } 
    catch (SAXException saxe) { 
     throw new SAXException("Error parsing XFDL into XML Document."); 
    } 
} 

我在Java代码中是这样写的文件保存到磁盘:

/** 
    * Saves the current document to the specified location 
    * @param destination Desired destination for the file. 
    * @param asXML True if output needs should be as un-encoded XML not Base64/GZIP 
    * @throws IOException File cannot be created at specified location 
    * @throws TransformerConfigurationExample 
    * @throws TransformerException 
    */ 
    public void saveFile(String destination, boolean asXML) 
     throws IOException, 
      TransformerConfigurationException, 
      TransformerException 
     { 

     BufferedWriter bf = new BufferedWriter(new FileWriter(destination)); 
     bf.write(FILE_HEADER_BLOCK); 
     bf.newLine(); 
     bf.flush(); 
     bf.close(); 

     OutputStream outStream; 
     if(!asXML) { 
      outStream = new GZIPOutputStream(
       new Base64.OutputStream(
         new FileOutputStream(destination, true))); 
     } else { 
      outStream = new FileOutputStream(destination, true); 
     } 

     Transformer t = TransformerFactory.newInstance().newTransformer(); 
     t.transform(new DOMSource(doc), new StreamResult(outStream)); 

     outStream.flush(); 
     outStream.close();  
    } 

希望有所帮助。

1

我一直在做这样的事情,这应该适用于PHP。你必须有一个可写的tmp文件夹,这个php文件被命名为example.php!

<?php 
    function gzdecode($data) { 
     $len = strlen($data); 
     if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) { 
      echo "FILE NOT GZIP FORMAT"; 
      return null; // Not GZIP format (See RFC 1952) 
     } 
     $method = ord(substr($data,2,1)); // Compression method 
     $flags = ord(substr($data,3,1)); // Flags 
     if ($flags & 31 != $flags) { 
      // Reserved bits are set -- NOT ALLOWED by RFC 1952 
      echo "RESERVED BITS ARE SET. VERY BAD"; 
      return null; 
     } 
     // NOTE: $mtime may be negative (PHP integer limitations) 
     $mtime = unpack("V", substr($data,4,4)); 
     $mtime = $mtime[1]; 
     $xfl = substr($data,8,1); 
     $os = substr($data,8,1); 
     $headerlen = 10; 
     $extralen = 0; 
     $extra  = ""; 
     if ($flags & 4) { 
      // 2-byte length prefixed EXTRA data in header 
      if ($len - $headerlen - 2 < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $extralen = unpack("v",substr($data,8,2)); 
      $extralen = $extralen[1]; 
      if ($len - $headerlen - 2 - $extralen < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $extra = substr($data,10,$extralen); 
      $headerlen += 2 + $extralen; 
     } 

     $filenamelen = 0; 
     $filename = ""; 
     if ($flags & 8) { 
      // C-style string file NAME data in header 
      if ($len - $headerlen - 1 < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $filenamelen = strpos(substr($data,8+$extralen),chr(0)); 
      if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $filename = substr($data,$headerlen,$filenamelen); 
      $headerlen += $filenamelen + 1; 
     } 

     $commentlen = 0; 
     $comment = ""; 
     if ($flags & 16) { 
      // C-style string COMMENT data in header 
      if ($len - $headerlen - 1 < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0)); 
      if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) { 
       return false; // Invalid header format 
       echo "INVALID FORMAT"; 
      } 
      $comment = substr($data,$headerlen,$commentlen); 
      $headerlen += $commentlen + 1; 
     } 

     $headercrc = ""; 
     if ($flags & 1) { 
      // 2-bytes (lowest order) of CRC32 on header present 
      if ($len - $headerlen - 2 < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff; 
      $headercrc = unpack("v", substr($data,$headerlen,2)); 
      $headercrc = $headercrc[1]; 
      if ($headercrc != $calccrc) { 
       echo "BAD CRC"; 
       return false; // Bad header CRC 
      } 
      $headerlen += 2; 
     } 

     // GZIP FOOTER - These be negative due to PHP's limitations 
     $datacrc = unpack("V",substr($data,-8,4)); 
     $datacrc = $datacrc[1]; 
     $isize = unpack("V",substr($data,-4)); 
     $isize = $isize[1]; 

     // Perform the decompression: 
     $bodylen = $len-$headerlen-8; 
     if ($bodylen < 1) { 
      // This should never happen - IMPLEMENTATION BUG! 
      echo "BIG OOPS"; 
      return null; 
     } 
     $body = substr($data,$headerlen,$bodylen); 
     $data = ""; 
     if ($bodylen > 0) { 
      switch ($method) { 
       case 8: 
        // Currently the only supported compression method: 
        $data = gzinflate($body); 
        break; 
       default: 
        // Unknown compression method 
        echo "UNKNOWN COMPRESSION METHOD"; 
       return false; 
      } 
     } else { 
      // I'm not sure if zero-byte body content is allowed. 
      // Allow it for now... Do nothing... 
      echo "ITS EMPTY"; 
     } 

     // Verifiy decompressed size and CRC32: 
     // NOTE: This may fail with large data sizes depending on how 
     //  PHP's integer limitations affect strlen() since $isize 
     //  may be negative for large sizes. 
     if ($isize != strlen($data) || crc32($data) != $datacrc) { 
      // Bad format! Length or CRC doesn't match! 
      echo "LENGTH OR CRC DO NOT MATCH"; 
      return false; 
     } 
     return $data; 
    } 
    echo "<html><head></head><body>"; 
    if (empty($_REQUEST['upload'])) { 
     echo <<<_END 
    <form enctype="multipart/form-data" action="example.php" method="POST"> 
    <input type="hidden" name="MAX_FILE_SIZE" value="100000" /> 
    <table> 
    <th> 
    <input name="uploadedfile" type="file" /> 
    </th> 
    <tr> 
    <td><input type="submit" name="upload" value="Convert File" /></td> 
    </tr> 
    </table> 
    </form> 
    _END; 

    } 
    if (!empty($_REQUEST['upload'])) { 
     $file   = "tmp/" . $_FILES['uploadedfile']['name']; 
     $orgfile  = $_FILES['uploadedfile']['name']; 
     $name   = str_replace(".xfdl", "", $orgfile); 
     $convertedfile = "tmp/" . $name . ".xml"; 
     $compressedfile = "tmp/" . $name . ".gz"; 
     $finalfile  = "tmp/" . $name . "new.xfdl"; 
     $target_path = "tmp/"; 
     $target_path = $target_path . basename($_FILES['uploadedfile']['name']); 
     if (move_uploaded_file($_FILES['uploadedfile']['tmp_name'], $target_path)) { 
     } else { 
      echo "There was an error uploading the file, please try again!"; 
     } 
     $firstline  = "application/vnd.xfdl; content-encoding=\"base64-gzip\"\n"; 
     $data   = file($file); 
     $data   = array_slice($data, 1); 
     $raw   = implode($data); 
     $decoded  = base64_decode($raw); 
     $decompressed = gzdecode($decoded); 
     $compressed  = gzencode($decompressed); 
     $encoded  = base64_encode($compressed); 
     $decoded2  = base64_decode($encoded); 
     $decompressed2 = gzdecode($decoded2); 
     $header   = bin2hex(substr($decoded, 0, 10)); 
     $tail   = bin2hex(substr($decoded, -8)); 
     $header2  = bin2hex(substr($compressed, 0, 10)); 
     $tail2   = bin2hex(substr($compressed, -8)); 
     $header3  = bin2hex(substr($decoded2, 0, 10)); 
     $tail3   = bin2hex(substr($decoded2, -8)); 
     $filehandle  = fopen($compressedfile, 'w'); 
     fwrite($filehandle, $decoded); 
     fclose($filehandle); 
     $filehandle  = fopen($convertedfile, 'w'); 
     fwrite($filehandle, $decompressed); 
     fclose($filehandle); 
     $filehandle  = fopen($finalfile, 'w'); 
     fwrite($filehandle, $firstline); 
     fwrite($filehandle, $encoded); 
     fclose($filehandle); 
     echo "<center>"; 
     echo "<table style='text-align:center' >"; 
     echo "<tr><th>Stage 1</th>"; 
     echo "<th>Stage 2</th>"; 
     echo "<th>Stage 3</th></tr>"; 
     echo "<tr><td>RAW DATA -></td><td>DECODED DATA -></td><td>UNCOMPRESSED DATA -></td></tr>"; 
     echo "<tr><td>LENGTH: ".strlen($raw)."</td>"; 
     echo "<td>LENGTH: ".strlen($decoded)."</td>"; 
     echo "<td>LENGTH: ".strlen($decompressed)."</td></tr>"; 
     echo "<tr><td><a href='tmp/".$orgfile."'/>ORIGINAL</a></td><td>GZIP HEADER:".$header."</td><td><a href='".$convertedfile."'/>XML CONVERTED</a></td></tr>"; 
     echo "<tr><td></td><td>GZIP TAIL:".$tail."</td><td></td></tr>"; 
     echo "<tr><td><textarea cols='30' rows='20'>" . $raw . "</textarea></td>"; 
     echo "<td><textarea cols='30' rows='20'>" . $decoded . "</textarea></td>"; 
     echo "<td><textarea cols='30' rows='20'>" . $decompressed . "</textarea></td></tr>"; 
     echo "<tr><th>Stage 6</th>"; 
     echo "<th>Stage 5</th>"; 
     echo "<th>Stage 4</th></tr>"; 
     echo "<tr><td>ENCODED DATA <-</td><td>COMPRESSED DATA <-</td><td>UNCOMPRESSED DATA <-</td></tr>"; 
     echo "<tr><td>LENGTH: ".strlen($encoded)."</td>"; 
     echo "<td>LENGTH: ".strlen($compressed)."</td>"; 
     echo "<td>LENGTH: ".strlen($decompressed)."</td></tr>"; 
     echo "<tr><td></td><td>GZIP HEADER:".$header2."</td><td></td></tr>"; 
     echo "<tr><td></td><td>GZIP TAIL:".$tail2."</td><td></td></tr>"; 
     echo "<tr><td><a href='".$finalfile."'/>FINAL FILE</a></td><td><a href='".$compressedfile."'/>RE-COMPRESSED FILE</a></td><td></td></tr>"; 
     echo "<tr><td><textarea cols='30' rows='20'>" . $encoded . "</textarea></td>"; 
     echo "<td><textarea cols='30' rows='20'>" . $compressed . "</textarea></td>"; 
     echo "<td><textarea cols='30' rows='20'>" . $decompressed . "</textarea></td></tr>"; 
     echo "</table>"; 
     echo "</center>"; 
    } 
    echo "</body></html>"; 
    ?>