在Java中解压缩巨大的gz文件和性能

我正在解压缩java中的巨大gz文件，gz文件大约2 GB，解压缩文件大约6 GB。有时它会在合理的时间内完成（比如在10分钟或更快的时间内）。
我有一个相当强大的盒子（8GB内存，4-cpu），有没有办法改进下面的代码？或使用完全不同的库？
另外我使用了Xms256m和Xmx4g到vm。在Java中解压缩巨大的gz文件和性能

public static File unzipGZ(File file, File outputDir) { 
    GZIPInputStream in = null; 
    OutputStream out = null; 
    File target = null; 
    try { 
     // Open the compressed file 
     in = new GZIPInputStream(new FileInputStream(file)); 

     // Open the output file 
     target = new File(outputDir, FileUtil.stripFileExt(file.getName())); 
     out = new FileOutputStream(target); 

     // Transfer bytes from the compressed file to the output file 
     byte[] buf = new byte[1024]; 
     int len; 
     while ((len = in.read(buf)) > 0) { 
      out.write(buf, 0, len); 
     } 

     // Close the file and stream 
     in.close(); 
     out.close(); 
    } catch (IOException e) { 
     e.printStackTrace(); 
    } finally { 
     if (in != null) { 
      try { 
       in.close(); 
      } catch (IOException e) { 
       // TODO Auto-generated catch block 
       e.printStackTrace(); 
      } 
     } 
     if (out != null) { 
      try { 
       out.close(); 
      } catch (IOException e) { 
       // TODO Auto-generated catch block 
       e.printStackTrace(); 
      } 
     } 
    } 
    return target; 
}

来源

2011-02-14 user121196

@ user121196：“数十亿”和Java不匹配。如果你已经控制了系统，并且如果它是一个Un * x盒子，我会考虑在这里调用一个外部过程。这不是很好，但有一个原因，为什么软件操纵真正巨大的文件或真正的巨大数量的文件（如Git，Mercurial等）不是用Java编写的... – Gugussee 2011-02-14 10:52:36

我不知道默认应用了多少缓冲区，如果有的话 - 但您可能想要尝试将输入和输出都打包在BufferedInputStream/BufferedOutputStream中。你也可以尝试增加你的缓冲区大小 - 1K是一个非常小的缓冲区。尝试使用不同的尺寸，例如16K，64K等等。当然，这些应该使得BufferedInputStream不那么重要。

另一方面，我怀疑这不是真的问题。如果它有时在10分钟内完成并且有时需要几个小时，则表明发生了一些非常奇怪的事情。当它需要很长时间时，它实际上是在进步吗？输出文件的大小是否增加？它使用重要的CPU吗？磁盘是否一直在使用？

一面请注意：当您在finally块中关闭in和out时，您不需要在try块中执行此操作。

来源

2011-02-14 10:51:36

如果您有8个内存的演出，并且输入文件在2个演出中，您可以尝试使用内存映射文件。 Here是一个如何做到这一点的例子。

来源

2011-02-14 10:49:28 aioobe

尝试使用来自java.nio的通道，有一种方法可以将字节从一个文件传输到其他文件通道。那么你不必自己复制它们。这可能会相当优化。请参阅FileInputStream.getChannel（）

来源

2011-02-14 11:13:17 jmg

在Java中解压缩巨大的gz文件和性能

回答

相关问题