numpy memmap修改文件

我在理解numpy.memmap的工作方式时遇到问题。背景是我需要通过删除条目来减少保存在光盘上的大型numpy阵列。读数组并通过复制所需的部分来建立一个新的部分不起作用 - 它只是不适合内存。所以想法是使用numpy.memmap - 即在光盘上工作。她是我的代码（具有很小的文件）：numpy memmap修改文件

import numpy 

in_file = './in.npy' 
in_len = 10 
out_file = './out.npy' 
out_len = 5 

# Set up input dummy-file 
dummy_in = numpy.zeros(shape=(in_len,1),dtype=numpy.dtype('uint32')) 
for i in range(in_len): 
    dummy_in[i] = i + i 
numpy.save(in_file, dummy_in) 

# get dtype and shape from the in_file 
in_npy = numpy.load(in_file) 

in_dtype = in_npy.dtype 
in_shape = (in_npy.shape[0],1) 
del(in_npy) 

# generate an 'empty' out_file with the desired dtype and shape 
out_shape = (out_len,1) 
out_npy = numpy.zeros(shape=out_shape, dtype=in_dtype) 
numpy.save(out_file, out_npy) 
del(out_npy) 

# memmap both files 
in_memmap = numpy.memmap(in_file, mode='r', shape=in_shape, dtype=in_dtype) 
out_memmap = numpy.memmap(out_file, mode='r+', shape=out_shape, dtype=in_dtype) 
print "in_memmap" 
print in_memmap, "\n" 
print "out_memmap before in_memmap copy" 
print out_memmap, "\n" 

# copy some parts 
for i in range(out_len): 
    out_memmap[i] = in_memmap[i] 

print "out_memmap after in_memmap copy" 
print out_memmap, "\n" 
out_memmap.flush() 

# test 
in_data = numpy.load(in_file) 
print "in.npy" 
print in_data 
print in_data.dtype, "\n" 

out_data = numpy.load(out_file) 
print "out.npy" 
print out_data 
print out_data.dtype, "\n"

运行这段代码中，我得到：

in_memmap 
[[1297436307] 
[  88400] 
[ 662372422] 
[1668506980] 
[ 540682098] 
[ 880098343] 
[ 656419879] 
[1953656678] 
[1601069426] 
[1701081711]] 

out_memmap before in_memmap copy 
[[1297436307] 
[  88400] 
[ 662372422] 
[1668506980] 
[ 540682098]] 

out_memmap after in_memmap copy 
[[1297436307] 
[  88400] 
[ 662372422] 
[1668506980] 
[ 540682098]] 

in.npy 
[[ 0] 
[ 2] 
[ 4] 
[ 6] 
[ 8] 
[10] 
[12] 
[14] 
[16] 
[18]] 
uint32 

out.npy 
[[0] 
[0] 
[0] 
[0] 
[0]] 
uint32

形成输出很显然，我做错了什么：

1 ）memmap不包含在数组中设置的值，并且in_memmap和out_memmap包含相同的值。

2）复制命令是否复制了从in_memmap到out_memmap（由于相同的值），所以不清楚。在调试模式下检查in_memmap[i]和out_memmap[i]的值我得到两个：memmap([1297436307], dtype=uint32)。那么我可以如代码中那样分配它们，还是必须使用：out_memmap[i][0] = in_memmap[i][0]？

3）out.npy不是由flush()操作更新为out_memmap值。

任何人都可以请帮助我了解我在这里做错了什么。

非常感谢

来源

2017-08-08 fdiehl

你的问题似乎是'np.save'和'np.memmap'有稍微不同的格式。检查[this]（https://stackoverflow.com/questions/23062674/numpy-memmap-map-to-save-file）回答出 –

另外，如果您经常使用比RAM更大的阵列，请检查[DASK]（https://dask.pydata.org/en/latest/） –

更换的np.memmap每个实例有np.lib.format.open_memmap并获得：

in_memmap 
[[ 0] 
[ 2] 
[ 4] 
[ 6] 
[ 8] 
[10] 
[12] 
[14] 
[16] 
[18]] 

out_memmap before in_memmap copy 
[[0] 
[0] 
[0] 
[0] 
[0]] 

out_memmap after in_memmap copy 
[[0] 
[2] 
[4] 
[6] 
[8]] 

in.npy 
[[ 0] 
[ 2] 
[ 4] 
[ 6] 
[ 8] 
[10] 
[12] 
[14] 
[16] 
[18]] 
uint32 

out.npy 
[[0] 
[2] 
[4] 
[6] 
[8]] 
uint32

np.save增加了报头np.memmap在读，这就是为什么在这两个数据看起来都一样（因为它是相同的标题）。这也是为什么当你将数据从一个数据复制到另一个数据时，它不起作用（因为它只是复制标题，而不是数据），因此可以自动跳过标题，以便处理数据。

来源

2017-08-08 13:02:29

numpy memmap修改文件

回答

相关问题