2017-05-31 153 views
8

在试验ZLib压缩时,我遇到了一个奇怪的问题。如果源数组长度至少为32752字节,则用随机数据解压缩zlib压缩的字节数组将失败。这里有一个重现问题的小程序,你可以see it in action on IDEOne。压缩和解压缩方法是摘取教程的标准代码。在大字节阵列上ZLib解压缩失败

public class ZlibMain { 

    private static byte[] compress(final byte[] data) { 
     final Deflater deflater = new Deflater(); 
     deflater.setInput(data); 

     deflater.finish(); 
     final byte[] bytesCompressed = new byte[Short.MAX_VALUE]; 
     final int numberOfBytesAfterCompression = deflater.deflate(bytesCompressed); 
     final byte[] returnValues = new byte[numberOfBytesAfterCompression]; 
     System.arraycopy(bytesCompressed, 0, returnValues, 0, numberOfBytesAfterCompression); 
     return returnValues; 

    } 

    private static byte[] decompress(final byte[] data) { 
     final Inflater inflater = new Inflater(); 
     inflater.setInput(data); 
     try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length)) { 
      final byte[] buffer = new byte[Math.max(1024, data.length/10)]; 
      while (!inflater.finished()) { 
       final int count = inflater.inflate(buffer); 
       outputStream.write(buffer, 0, count); 
      } 
      outputStream.close(); 
      final byte[] output = outputStream.toByteArray(); 
      return output; 
     } catch (DataFormatException | IOException e) { 
      throw new RuntimeException(e); 
     } 
    } 

    public static void main(final String[] args) { 
     roundTrip(100); 
     roundTrip(1000); 
     roundTrip(10000); 
     roundTrip(20000); 
     roundTrip(30000); 
     roundTrip(32000); 
     for (int i = 32700; i < 33000; i++) { 
      if(!roundTrip(i))break; 
     } 
    } 

    private static boolean roundTrip(final int i) { 
     System.out.printf("Starting round trip with size %d: ", i); 
     final byte[] data = new byte[i]; 
     for (int j = 0; j < data.length; j++) { 
      data[j]= (byte) j; 
     } 
     shuffleArray(data); 

     final byte[] compressed = compress(data); 
     try { 
      final byte[] decompressed = CompletableFuture.supplyAsync(() -> decompress(compressed)) 
                 .get(2, TimeUnit.SECONDS); 
      System.out.printf("Success (%s)%n", Arrays.equals(data, decompressed) ? "matching" : "non-matching"); 
      return true; 
     } catch (InterruptedException | ExecutionException | TimeoutException e) { 
      System.out.println("Failure!"); 
      return false; 
     } 
    } 

    // Implementing Fisher–Yates shuffle 
    // source: https://stackoverflow.com/a/1520212/342852 
    static void shuffleArray(byte[] ar) { 
     Random rnd = ThreadLocalRandom.current(); 
     for (int i = ar.length - 1; i > 0; i--) { 
      int index = rnd.nextInt(i + 1); 
      // Simple swap 
      byte a = ar[index]; 
      ar[index] = ar[i]; 
      ar[i] = a; 
     } 
    } 
} 

这是在zlib的一个已知的bug?或者我的压缩/解压缩例程有错误?

回答

4

它是在压缩的逻辑错误/解压缩方法。我不是这个深的实施方案,而是与调试,我发现了以下内容:

当32752个字节的缓冲区被压缩时,deflater.deflate()方法返回的32767的值,这是你在初始化的缓冲区大小行:

final byte[] bytesCompressed = new byte[Short.MAX_VALUE]; 

如果增加例如缓冲区大小,以

final byte[] bytesCompressed = new byte[4 * Short.MAX_VALUE]; 

的,你会看到,的32752个字节输入实际上被放气到32768个字节。因此,在您的代码中,压缩数据不包含应该在其中的所有数据。

当您尝试解压时,inflater.inflate()方法返回零,表示需要更多输入数据。但是,你只能检查inflater.finished(),你会以无尽的循环结束。

因此,您可以增加压缩时的缓冲区大小,但这可能意味着要解决更大文件的问题,或者您最好重写压缩/解压缩逻辑以按块处理数据。

+0

谢谢。正如所写的,它不是我的代码,我现在用代码替换它。但是,要感谢关于代码有什么问题的启发。 –

+0

是个不错的问题;我喜欢狩猎这样的错误;-) –

+0

非常好的调查! – nobeh

4

显然,compress()方法是错误的。 这一个工作的:

public static byte[] compress(final byte[] data) { 
    try (final ByteArrayOutputStream outputStream = 
            new ByteArrayOutputStream(data.length);) { 

     final Deflater deflater = new Deflater(); 
     deflater.setInput(data); 
     deflater.finish(); 
     final byte[] buffer = new byte[1024]; 
     while (!deflater.finished()) { 
      final int count = deflater.deflate(buffer); 
      outputStream.write(buffer, 0, count); 
     } 

     final byte[] output = outputStream.toByteArray(); 
     return output; 
    } catch (IOException e) { 
     throw new IllegalStateException(e); 
    } 
} 
+2

您还需要检查inflater.inflate()返回0 –