是否可以在不加载整个文件的情况下从hdf5文件读取给定的一组行?我有一个数据集的负荷相当大的HDF5文件,这里是什么,我脑子里想的,以减少时间和内存使用情况的例子:h5py:如何读取hdf5文件的选定行?
#! /usr/bin/env python
import numpy as np
import h5py
infile = 'field1.87.hdf5'
f = h5py.File(infile,'r')
group = f['Data']
mdisk = group['mdisk'].value
val = 2.*pow(10.,10.)
ind = np.where(mdisk>val)[0]
m = group['mcold'][ind]
print m
ind
不给连续的行,但比较分散的。
上述代码失败,但它遵循切片hdf5数据集的标准方式。该错误消息我得到的是:
Traceback (most recent call last):
File "./read_rows.py", line 17, in <module>
m = group['mcold'][ind]
File "/cosma/local/Python/2.7.3/lib/python2.7/site-packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/dataset.py", line 425, in __getitem__
selection = sel.select(self.shape, args, dsid=self.id)
File "/cosma/local/Python/2.7.3/lib/python2.7/site-packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/selections.py", line 71, in select
sel[arg]
File "/cosma/local/Python/2.7.3/lib/python2.7/site-packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/selections.py", line 209, in __getitem__
raise TypeError("PointSelection __getitem__ only works with bool arrays")
TypeError: PointSelection __getitem__ only works with bool arrays
说它'失败',但没有显示错误信息,或者什么是错误的,这里是一个很大的禁忌。 – hpaulj 2015-02-09 21:27:42
您正在将整个'mdisk'数组加载到内存中。我不得不深入文档以确定有多少'mcold'被加载。这可能取决于'ind'是否是一个紧凑的切片或散布在数组中的值。 – hpaulj 2015-02-09 21:32:32