忽略numpy的数组创建中的嵌套结构

我想写入一个vlen hdf5数据集，因为我使用h5py.Dataset.write_direct来加速这个过程。假设我有numpy的阵列的列表（例如，通过cv2.findContours给出），并通过数据集：忽略numpy的数组创建中的嵌套结构

dataset = h5file.create_dataset('dataset', \ 
           shape=..., \ 
           dtype=h5py.special_type(vlen='int32')) 
contours = [numpy array, ...]

对于写contours由片dest给定一个目标，我必须首先转换contours到numpy的阵列的numpy的阵列：

contours = numpy.array(contours) # shape=(len(contours),); dtype=object 
dataset.write_direct(contours, None, dest)

但是这仅适用，如果在轮廓所有numpy的阵列具有不同的形状，例如：

contours = [np.zeros((10,), 'int32'), np.zeros((10,), 'int32')] 
contours = numpy.array(contours) # shape=(2,10); dtype='int32'

问题是：我如何告诉numpy创建一个对象数组？

可能的解决方案：

手动创建：

contours_np = np.empty((len(contours),), dtype=object) 
for i, contour in enumerate(contours): 
    contours_np[i] = contour

但循环是超级慢，因此利用map：

map(lambda (i, contour): contour.__setitem_(i, contour), \ 
    enumerate(contours))

我测试了第二个选项，这是两倍于上述速度，但也超级丑陋：

contours = np.array(contours + [None])[:-1]

这里是微基准测试：

l = [np.random.normal(size=100) for _ in range(1000)]

选项1：

$ start = time.time(); l_array = np.zeros(shape=(len(l),), dtype='O'); map(lambda (i, c): l_array.__setitem__(i, c), enumerate(l)); end = time.time(); print("%fms" % ((end - start) * 10**3)) 
0.950098ms

选项2：

$ start = time.time(); np.array(l + [None])[:-1]; end = time.time(); print("%fms" % ((end - start) * 10**3)) 
0.409842ms

这看起来的丑样，任何其他建议？

来源

2016-06-11 user1447257

在这个版本中

contours_np = np.empty((len(contours),), dtype=object) 
for i, contour in enumerate(contours): 
    contours_np[i] = contour

您可以用单个语句

contours_np[...] = contours

更换循环

来源

2016-06-11 10:34:18

这就是我一直在寻找的:) – user1447257

一种解决方案似乎是先创建“外部”数组（带有'object'dtype），然后用内部数组填充元素。

这样：

contours = [np.zeros((10,), 'int32'), np.zeros((10,), 'int32')] 
a = np.empty(len(contours), dtype=np.object) 
for i in range(len(contours)): 
    a[i] = contours[i] 
print(a) 
print() 
print(repr(a))

结果

[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32) 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)] 

array([array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32), 
     array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)], dtype=object)

来源

2016-06-11 10:26:57 Evert

忽略numpy的数组创建中的嵌套结构

回答

相关问题