2017-07-27 73 views
0

我有一个.raw文件,其中包含一个52行html标头,后面跟着数据本身。该文件编码在little-endian 24bits SIGNED,我想将数据转换为ASCII文件中的整数。我使用Python 3将Little-endian 24位文件转换为ASCII数组

我想 '解压' 与this post发现下面的代码将整个文件:

import sys 
import chunk 
import struct 

f1 = open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') 
data = struct.unpack('<i', chunk + ('\0' if chunk[2] < 128 else '\xff')) 

但我收到此错误信息:

TypeError: 'module' object is not subscriptable 

编辑

它似乎这是更好的:

data = struct.unpack('<i','\0'+ bytes)[0] >> 8 

但我仍然得到一个错误信息:

TypeError: must be str, not type 

容易解决我相信?

+0

你能发表'f1.read()'的结果吗? – Tomalak

+1

1)屏幕转储不受欢迎:大容量存储空间,不可重复使用,不可搜索2)问题在于* chunk *模块。可能是模块名称和您选择的实例变量之间的名称冲突。或者你忘了实例化* Chunk *类的东西呢? – guidot

+0

您需要首先从HTML中分离二进制数据。不要使用'bytes'作为变量名称,因为它与Python自己的'bytes'类型冲突 –

回答

0

这不是一个很好的文件在Python中处理! Python非常适合处理文本文件,因为它在内部缓冲区中以大块的形式读取它们,然后在线上进行迭代,但不能轻松访问文本读取之后出现的二进制数据。此外,struct模块不支持24位值。

我能想象的唯一方法是一次读取一个字节的文件,首先跳过52行结束行,然后每次读取字节3,将它们连接成4字节的字节串并解压缩。

可能的代码可以是:

eol = b'\n'   # or whatever is the end of line in your file 
nlines = 52   # number of lines to skip 

with open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') as f1: 

    for i in range(nlines):  # process nlines lines 
     t = b''     # to store the content of each line 
     while True: 
      x = f1.read(1)  # one byte at a time 
      if x == eol:   # ok we have one full line 
       break 
      else: 
       t += x   # else concatenate into current line 
     print(t)     # to control the initial 52 lines 

    while True: 
     t = bytes((0,))    # struct only knows how to process 4 bytes int 
     for i in range(3):   # so build one starting with a null byte 
      t += f1.read(1) 
     # print(t) 
     if(len(t) == 1): break  # reached end of file 
     if(len(t) < 4):    # reached end of file with uncomplete value 
      print("Remaining bytes at end of file", t) 
      break 
     # the trick is that the integer division by 256 skips the initial 0 byte and keeps the sign 
     i = struct.unpack('<i', t)[0]//256 # // for Python 3, only/for Python 2 
     print(i, hex(i))      # or any other more useful processing 

注:上面的代码假定您的52行描述(由线的端部封端的)是真实的,但示出的图像让认为最后一行是没有的。在这种情况下,您应该先计算51行,然后跳过最后一行的内容。

def skipline(fd, nlines, eol): 
    for i in range(nlines):  # process nlines lines 
     t = b''     # to store the content of each line 
     while True: 
      x = fd.read(1)  # one byte at a time 
      if x == eol:   # ok we have one full line 
       break 
      else: 
       t += x   # else concatenate into current line 
     # print(t)     # to control the initial 52 lines 

with open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') as f1: 
    skiplines(f1, 51, b'\n')  # skip 51 lines terminated with a \n 
    skiplines(f1, 1, b'>')  # skip last line assuming it ends at the > 

    ... 
+0

非常感谢您的回答,附有详细的解释,这对我来说是必要的,因为我刚开始编程。 我用Matlab代码对结果进行了交叉检查,没有意外,它完美的工作!再次感谢 !! – ananas