2011-04-24 66 views
8

我正在使用Scipy的KDTree实现来读取300 MB的大文件。现在,有没有办法将数据结构保存到磁盘并重新加载它,或者我坚持每次从文件中读取原始数据并在每次启动程序时构造数据结构?我正在构建KDTree,如下所示:在Python中保存KDTree对象?

def buildKDTree(self): 
     self.kdpoints = numpy.fromfile("All", sep=' ') 
     self.kdpoints.shape = self.kdpoints.size/self.NDIM, NDIM 
     self.kdtree = KDTree(self.kdpoints, leafsize = self.kdpoints.shape[0]+1) 
     print "Preparing KDTree... Ready!" 

有什么建议吗?

+1

你尝试过酸洗? – helloworld922 2011-04-24 21:04:48

+0

当我试图在KDTree对象上使用cPickle时,我的计算机上出现错误 – JoshAdel 2011-04-24 22:19:04

回答

10

KDtree使用嵌套类来定义其节点类型(innernode,leafnode)。泡菜只能在模块级的类定义,所以嵌套类车次起来:

import cPickle 

class Foo(object): 
    class Bar(object): 
     pass 

obj = Foo.Bar() 
print obj.__class__ 
cPickle.dumps(obj) 

<class '__main__.Bar'> 
cPickle.PicklingError: Can't pickle <class '__main__.Bar'>: attribute lookup __main__.Bar failed 

但是,通过猴子打补丁的类定义为scipy.spatial.kdtree在模块范围,所以,皮克勒一(哈克)解决方法可以找到他们。如果您的所有代码的读取和写入腌制KDtree对象安装这些补丁,这个技巧应该很好地工作:

import cPickle 
import numpy 
from scipy.spatial import kdtree 

# patch module-level attribute to enable pickle to work 
kdtree.node = kdtree.KDTree.node 
kdtree.leafnode = kdtree.KDTree.leafnode 
kdtree.innernode = kdtree.KDTree.innernode 

x, y = numpy.mgrid[0:5, 2:8] 
t1 = kdtree.KDTree(zip(x.ravel(), y.ravel())) 
r1 = t1.query([3.4, 4.1]) 
raw = cPickle.dumps(t1) 

# read in the pickled tree 
t2 = cPickle.loads(raw) 
r2 = t2.query([3.4, 4.1]) 
print t1.tree.__class__ 
print repr(raw)[:70] 
print t1.data[r1[1]], t2.data[r2[1]] 

输出:

<class 'scipy.spatial.kdtree.innernode'> 
"ccopy_reg\n_reconstructor\np1\n(cscipy.spatial.kdtree\nKDTree\np2\nc_ 
[3 4] [3 4] 
+0

您是否也有针对cython cKDTree的补丁? – denis 2011-04-25 11:41:19

+0

@Denis不幸的是我没有cKDTree的补丁。某些形式的保存/加载方法应该是可能的,但是会更加自定义,因为[cKDTree](http://svn.scipy.org/svn/scipy/trunk/scipy/spatial/ckdtree.pyx)节点是malloc'd结构,而不是类。 – samplebias 2011-04-25 13:32:50

+0

不幸的是我得到的错误: “调用Python对象时超出最大递归深度” 公平地说,我的树是在一个1,000,000长的5d坐标列表上计算的,因为它只需要几分钟就可以从该数组中计算(数组本身我可以通过numpy保存并加载)我想我必须忍受这一点。 – CastleH 2014-09-16 15:53:43