从二进制文件条形标题

我有一个原始的二进制文件是几个演出，我试图以块处理它。在开始处理数据之前，我必须删除它的标题。由于原始的二进制文件格式，没有任何字符串方法如.find或数据块中的字符串检查工作。我想自动剥离标题，但是它的长度可能会有所不同，而我目前寻找最后一行新字符的方法不起作用，因为原始二进制数据在数据中具有匹配的位。从二进制文件条形标题

Data format: 
BEGIN_HEADER\r\n 
header of various line count\r\n 
HEADER_END\r\n raw data starts here

如何我读文件

filename="binary_filename" 
chunksize=1024 
with open(filename, "rb") as f: 
    chunk = f.read(chunksize) 
    for index, byte in enumerate(chunk): 
     if byte == ord('\n'): 
      print("found one " + str(index))

有没有一种简单的方法来提取HEADER_END \ r \ n个线而无需通过文件滑动的字节数组吗？现行办法：

chunk = f.read(chunksize) 
index=0 
not_found=True 
while not_found: 
    if chunk[index:index+12] == b'HEADER_END\r\n': 
     print("found") 
     not_found=False 
    index+=1

来源

2016-06-28 kaminsknator

你可以使用linecache：

import linecache 
currentline = 0 
while(linecache.getline("file.bin",currentline)!="HEADER_END\n"): 
    currentline=currentline+1 

#print raw data 
currentline = currentline + 1 
rawdata = linecache.getline("file.bin",currentline) 
currentrawdata = rawdata 
while(currentrawdata): 
    currentrawdata = linecache.getline("file.bin",currentline+1) 
    rawdata = rawdata + currentrawdata 
    currentline = currentline + 1 
print rawdata

UPDATE

我们可以一分为二的问题，首先我们可以删除的标题，然后我们可以看到它成块：

lines= open('test_file.bin').readlines() 
currentline = 0 
while(lines[currentline] != "HEADER_END\r\n"): 
    currentline=currentline+1 
open('newfile.bin', 'w').writelines(lines[currentline:-1])

将创建包含ju的文件（newfile.bin） st原始数据。现在可以照片直接在成批读：

chunksize=1024 
with open('newfile.bin', "rb") as f: 
    chunk = f.read(chunksize)

更新2

也可以做到这一点，而无需使用中间文件：

#defines the size of the chunks 
chunksize=20 
filename= 'test_file.bin' 
endHeaderTag = "HEADER_END\r\n" 
#Identifies at which line there is HEADER_END 
lines= open(filename).readlines() 
currentline = 0 
while(lines[currentline] != endHeaderTag): 
    currentline=currentline+1 
currentline=currentline+1 
#Now currentline contains the index of the first line to the raw data 

#With the reduce operation we generate a single string from the list of lines 
#we are considering only the lines after the currentline 
header_stripped = reduce(lambda x,y:x+y,lines[currentline:]) 

#Lastly we read successive chunks and we store them into the chunk list. 
chunks = [] 
reminder = len(header_stripped)%chunksize 
for i in range(1,len(header_stripped)/chunksize + reminder): 
    chunks.append(header_stripped[(i-1)*chunksize:i*chunksize])

来源

2016-06-28 16:43:16 Trugis

没有用于没有新线的标签大二进制文件的其余部分。它会尝试将文件的其余部分作为单行读取吗？ – kaminsknator

我没有得到原始数据部分可能有新行，我更新了答案。 – Trugis

如果数据中没有新行，将原始数据读入为一个大块？在这种情况下，该块将是几个演出。 – kaminsknator

从二进制文件条形标题

回答

相关问题