2016-12-03 80 views
2

我有以下代码:什么时候一个任务在Python中进行深层复制?

import pandas as pd 
store = pd.HDFStore('cache.h5') 
data = store['data'] 

在这种情况下,是data的HDF5数据的深层,在内存中拷贝,或者是一个指向磁盘上的原始数据?

+0

究竟你*指针到磁盘上的原始数据意味着什么*? – Tobias

回答

1

这是一个“内存对象”,它不会自动反映(刷新)到磁盘。

演示:

In [16]: fn = r'D:\temp\.data\test.h5' 

In [17]: store = pd.HDFStore(fn) 

In [18]: store 
Out[18]: 
<class 'pandas.io.pytables.HDFStore'> 
File path: D:\temp\.data\test.h5 
/df2    frame_table (typ->appendable,nrows->7,ncols->4,indexers->[index],dc->[Col1,Col2,Col3,Col4]) 
/test   frame_table (typ->appendable,nrows->7,ncols->4,indexers->[index],dc->[Col1,Col2,Col3,Col4]) 

从磁盘(HDF店)读入数据帧(在内存中的对象):

In [19]: data = store['test'] 

In [20]: data 
Out[20]: 
     Col1  Col2 Col3 Col4 
0  what  the  0  0 
1  are curves  1  8 
2  men  of  2 16 
3   to  your  3 24 
4  rocks  lips  4 32 
5  and rewrite  5 40 
6 mountains history.  6 48 

In [21]: data.Col4 = 1000 

In [22]: data 
Out[22]: 
     Col1  Col2 Col3 Col4 
0  what  the  0 1000 
1  are curves  1 1000 
2  men  of  2 1000 
3   to  your  3 1000 
4  rocks  lips  4 1000 
5  and rewrite  5 1000 
6 mountains history.  6 1000 

In [23]: store.close() 

In [24]: store = pd.HDFStore(fn) 

In [25]: store['test'] 
Out[25]: 
     Col1  Col2 Col3 Col4 
0  what  the  0  0 
1  are curves  1  8 
2  men  of  2 16 
3   to  your  3 24 
4  rocks  lips  4 32 
5  and rewrite  5 40 
6 mountains history.  6 48 

UPDATE:以下的小的演示表明,data DF不取决于store已从HDF Store中读取:

In [26]: store.close() 

In [27]: store = pd.HDFStore(fn) 

In [28]: del data 

In [29]: data = store['test'] 

让我们删除store对象

In [30]: del store 

data仍然存在

In [31]: data 
Out[31]: 
     Col1  Col2 Col3 Col4 
0  what  the  0  0 
1  are curves  1  8 
2  men  of  2 16 
3   to  your  3 24 
4  rocks  lips  4 32 
5  and rewrite  5 40 
6 mountains history.  6 48 
+0

这是否意味着它是一个深层复制?如果我从'数据'读取,我现在是从RAM还是从磁盘读取? – cjm2671

+0

是的,您可以将'data'视为HDF商店中一个表的深层副本。基本上它是一个DataFrame(内存中的对象),其中cache.h5是HDF(h5)文件(在磁盘上),可能包含多个表(DataFrame) – MaxU

+0

谢谢,这非常有帮助! – cjm2671

相关问题