压缩和解压缩整数数组的函数

我最近被要求完成C++角色的任务，但是由于应用程序决定不再进一步发展，我以为我会在这里发布一些反馈/建议/改进/提醒我已经忘记的概念。压缩和解压缩整数数组的函数

的任务是：

下面的数据是一个时间序列整数值

int timeseries[32] = {67497, 67376, 67173, 67235, 67057, 67031, 66951, 
66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044, 
67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620, 
66579, 66596, 66713, 66852, 66715};

该系列可能是，例如，每天股票的收盘价较32日期。

如上所述，假设4字节整数，数据将占用32 x sizeof(int) bytes = 128 bytes 。

使用增量编码，编写一个函数来压缩，并且函数解压缩数据如上。好吧，所以在这之前，我从来没有看过压缩，所以我的解决方案还很不完善。我接近这个问题的方式是将整数数组压缩成一个字节数组。当将整数表示为一个字节时，我保留计算最重要的字节（msb），并将所有内容保留至此点，同时将其余部分丢弃。然后将其添加到字节数组中。对于负值，我将msb增加1，这样我们可以通过保留前导 1位值来区分正解码字节和负解码字节。

解码时，我解析这个参差不齐的字节数组，简单地颠倒了我以前在压缩时执行的操作。如前所述，我从未在此任务之前查看过压缩，因此我确实想出了自己的方法来压缩数据。我最近在看C++/Cli，之前没有真正使用它，所以只是决定用这种语言编写它，没有特别的理由。以下是课程，以及最底层的单元测试。任何建议/改进/增强将非常感谢。

谢谢。

array<array<Byte>^>^ CDeltaEncoding::CompressArray(array<int>^ data) 
{ 
    int temp = 0; 
    int original; 
    int size = 0; 

    array<int>^ tempData = gcnew array<int>(data->Length); 
    data->CopyTo(tempData, 0); 

    array<array<Byte>^>^ byteArray = gcnew array<array<Byte>^>(tempData->Length); 

    for (int i = 0; i < tempData->Length; ++i) 
    { 
      original = tempData[i]; 
      tempData[i] -= temp; 
      temp = original; 

      int msb = GetMostSignificantByte(tempData[i]); 

      byteArray[i] = gcnew array<Byte>(msb); 
      System::Buffer::BlockCopy(BitConverter::GetBytes(tempData[i]), 0, byteArray[i], 0, msb); 

      size += byteArray[i]->Length; 
    } 

    return byteArray; 
} 

array<int>^ CDeltaEncoding::DecompressArray(array<array<Byte>^>^ buffer) 
{ 
    System::Collections::Generic::List<int>^ decodedArray = gcnew System::Collections::Generic::List<int>(); 

    int temp = 0; 

    for (int i = 0; i < buffer->Length; ++i) 
    { 
      int retrievedVal = GetValueAsInteger(buffer[i]); 
      decodedArray->Add(retrievedVal); 

      decodedArray[i] += temp; 
      temp = decodedArray[i]; 
    } 

    return decodedArray->ToArray(); 
} 

int CDeltaEncoding::GetMostSignificantByte(int value) 
{ 
    array<Byte>^ tempBuf = BitConverter::GetBytes(Math::Abs(value)); 

    int msb = tempBuf->Length; 

    for (int i = tempBuf->Length -1; i >= 0; --i) 
    { 
      if (tempBuf[i] != 0) 
      { 
        msb = i + 1; 
        break; 
      } 
    } 

    if (!IsPositiveInteger(value)) 
    { 
      //We need an extra byte to differentiate the negative integers 
      msb++; 
    } 

    return msb; 
} 

bool CDeltaEncoding::IsPositiveInteger(int value) 
{ 
    return value/Math::Abs(value) == 1; 
} 

int CDeltaEncoding::GetValueAsInteger(array<Byte>^ buffer) 
{ 
    array<Byte>^ tempBuf; 

    if(buffer->Length % 2 == 0) 
    { 
      //With even integers there is no need to allocate a new byte array 
      tempBuf = buffer; 
    } 
    else 
    { 
      tempBuf = gcnew array<Byte>(4); 
      System::Buffer::BlockCopy(buffer, 0, tempBuf, 0, buffer->Length); 

      unsigned int val = buffer[buffer->Length-1] &= 0xFF; 

      if (val == 0xFF) 
      { 
        //We have negative integer compressed into 3 bytes 
        //Copy over the this last byte as well so we keep the negative pattern 
        System::Buffer::BlockCopy(buffer, buffer->Length-1, tempBuf, buffer->Length, 1); 
      } 
    } 

    switch(tempBuf->Length) 
    { 
    case sizeof(short): 
      return BitConverter::ToInt16(tempBuf,0); 
    case sizeof(int): 
    default: 
      return BitConverter::ToInt32(tempBuf,0); 
    } 
}

，然后在测试类我：

void CTestDeltaEncoding::TestCompression() 
{ 
    array<array<Byte>^>^ byteArray = CDeltaEncoding::CompressArray(m_testdata); 

    array<int>^ decompressedArray = CDeltaEncoding::DecompressArray(byteArray); 

    int totalBytes = 0; 
    for (int i = 0; i<byteArray->Length; i++) 
    { 
      totalBytes += byteArray[i]->Length; 
    } 

    Assert::IsTrue(m_testdata->Length * sizeof(m_testdata) > totalBytes, "Expected the total bytes to be less than the original array!!"); 

    //Expected totalBytes = 53 
}

来源

2009-10-20 Zahir

这闻起来很像功课给我。关键词是：“使用增量编码”。

德尔塔编码意味着，每个号码与下一个之间编码的Δ（差）：

67497，67376，67173，67235，67057，67031，66951，66974，67042，67025，66897，67077，67082 ，67033，67019，67149，67044，67012，67220，67239，66893，66984，66866，66693，66770，66722，66620，66579，66596，66713，66852，66715

会变成：

[Base：67497]：-121，-203，+62

等等。假设8位字节，原始数字需要每个字节3个字节（给定具有3字节整数类型的编译器数量，通常每个字节最多需要4个字节）。从外观上看，这些差异很容易以每个字节2个字节为单位，如果你可以忽略一个（或者可能是两个）最不重要的位，你可以将它们放入一个字节中。

增量编码最常用于声音编码之类的地方，您可以在不出现重大问题的情况下“精确地”确定准确性。例如，如果您从一个样本到另一个样本的变化大于您留下空间进行编码，您可以编码当前差异的最大变化，并将该差异添加到下一个增量（如果您不'不介意一些回溯，你也可以分发一些回到之前的三角洲）。这将作为低通滤波器，限制样本之间的梯度。

例如，在您给出的系列中，简单的增量编码需要十位来表示所有的差异。然而，通过丢弃LSB，几乎所有的样本（实际上除一个之外）都可以用8位编码。那个有-173的差别（右移一位），所以如果我们将它表示为-128，我们剩下45个。我们可以在前后样本之间均匀分配该错误。在这种情况下，输出不会与输入完全匹配，但如果我们正在讨论类似声音的事情，则差异可能不会特别明显。

来源

2009-10-20 15:03:14

我确实提到过这是我必须完成的一项练习，并且我收到的解决方案被认为不够好，所以我希望看到一些有建设性的反馈，因为实际公司从未决定告诉您您做错了什么。

当数组被压缩时，我存储的差异，而不是原来的值，因为这是我的理解。如果你看过我的代码，我提供了一个完整的解决方案，但是我的问题是多么糟糕？

来源

2009-10-20 16:30:43 Zahir

压缩和解压缩整数数组的函数

回答

相关问题