2011-12-31 59 views
2

我有我的代码中使用了很多小的方便类如下:循环引用的对象没有得到垃圾收集

class Structure(dict): 
    def __init__(self, **kwargs): 
     dict.__init__(self, **kwargs) 
     self.__dict__ = self 

关于它的好处是,你可以通过访问属性字典的键语法或通常对象样式:

myStructure = Structure(name="My Structure") 
print myStructure["name"] 
print myStructure.name 

今天,我已经注意到,我的应用程序的内存消耗是在我本来期望它减少的情况略有增加。在我看来,从结构类生成的实例不垃圾收集。在这里说明这是一个小片段:

import gc 

class Structure(dict): 
    def __init__(self, **kwargs): 
     dict.__init__(self, **kwargs) 
     self.__dict__ = self 

structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)] 
print "Structure name: ", structures[16].name 
print "Structure name: ", structures[16]["name"] 
del structures 
gc.collect() 
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure]) 

用下面的输出:

Structure name: __16 
Structure name: __16 
Structures count: 4096 

当你注意到结构实例数仍是4096

我评论的行创建方便的自我参考:

import gc 

class Structure(dict): 
    def __init__(self, **kwargs): 
     dict.__init__(self, **kwargs) 
     # self.__dict__ = self 

structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)] 
# print "Structure name: ", structures[16].name 
print "Structure name: ", structures[16]["name"] 
del structures 
gc.collect() 
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure]) 

既然循环引用被移除时输出有意义:

Structure name: __16 
Structures count: 0 

我推一点进一步使用Melia分析内存消耗测试:

import gc 
import pprint 
from meliae import scanner 
from meliae import loader 

class Structure(dict): 
    def __init__(self, **kwargs): 
     dict.__init__(self, **kwargs) 
     self.__dict__ = self 

structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)] 
print "Structure name: ", structures[16].name 
print "Structure name: ", structures[16]["name"] 
del structures 
gc.collect() 
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure]) 

scanner.dump_all_objects("Test_001.json") 
om = loader.load("Test_001.json") 
summary = om.summarize() 
print summary 

structures = om.get_all("Structure") 
if structures: 
    pprint.pprint(structures[0].c) 

产生以下输出:

Structure name: __16 
Structure name: __16 
Structures count: 4096 
loading... line 5001, 5002 objs, 0.6/ 1.8 MiB read in 0.2s 
loading... line 10002, 10003 objs, 1.1/ 1.8 MiB read in 0.3s 
loading... line 15003, 15004 objs, 1.7/ 1.8 MiB read in 0.5s 
loaded line 16405, 16406 objs, 1.8/ 1.8 MiB read in 0.5s   
checked  1/ 16406 collapsed  0  
checked 16405/ 16406 collapsed  157  
compute parents  0/ 16249   
compute parents 16248/ 16249   
set parents 16248/ 16249   
collapsed in 0.2s 
Total 16249 objects, 58 types, Total size = 3.2MiB (3306183 bytes) 
Index Count %  Size % Cum  Max Kind 
    0 4096 25 1212416 36 36  296 Structure 
    1  390 2 536976 16 52 49432 dict 
    2 5135 31 417550 12 65 12479 str 
    3  82 0 290976 8 74 12624 module 
    4  235 1 212440 6 80  904 type 
    5  947 5 121216 3 84  128 code 
    6 1008 6 120960 3 88  120 function 
    7 1048 6  83840 2 90  80 wrapper_descriptor 
    8  654 4  47088 1 92  72 builtin_function_or_method 
    9  562 3  40464 1 93  72 method_descriptor 
    10  517 3  37008 1 94  216 tuple 
    11  139 0  35832 1 95 2280 set 
    12  351 2  30888 0 96  88 weakref 
    13  186 1  23200 0 97 1664 list 
    14  63 0  21672 0 97  344 WeakSet 
    15  21 0  18984 0 98  904 ABCMeta 
    16  197 1  14184 0 98  72 member_descriptor 
    17  188 1  13536 0 99  72 getset_descriptor 
    18  284 1  6816 0 99  24 int 
    19  14 0  5296 0 99 2280 frozenset 
[Structure(4312707312 296B 2refs 2par), 
type(4298634592 904B 4refs 100par 'Structure')] 

内存使用量为3.2MiB,删除自引用行会导致以下输出:

Structure name: __16 
Structures count: 0 
loading... line 5001, 5002 objs, 0.6/ 1.4 MiB read in 0.1s 
loading... line 10002, 10003 objs, 1.1/ 1.4 MiB read in 0.3s 
loaded line 12308, 12309 objs, 1.4/ 1.4 MiB read in 0.4s   
checked  12/ 12309 collapsed  0  
checked 12308/ 12309 collapsed  157  
compute parents  0/ 12152   
compute parents 12151/ 12152   
set parents 12151/ 12152   
collapsed in 0.1s 
Total 12152 objects, 57 types, Total size = 2.0MiB (2093714 bytes) 
Index Count %  Size % Cum  Max Kind 
    0  390 3 536976 25 25 49432 dict 
    1 5134 42 417497 19 45 12479 str 
    2  82 0 290976 13 59 12624 module 
    3  235 1 212440 10 69  904 type 
    4  947 7 121216 5 75  128 code 
    5 1008 8 120960 5 81  120 function 
    6 1048 8  83840 4 85  80 wrapper_descriptor 
    7  654 5  47088 2 87  72 builtin_function_or_method 
    8  562 4  40464 1 89  72 method_descriptor 
    9  517 4  37008 1 91  216 tuple 
    10  139 1  35832 1 92 2280 set 
    11  351 2  30888 1 94  88 weakref 
    12  186 1  23200 1 95 1664 list 
    13  63 0  21672 1 96  344 WeakSet 
    14  21 0  18984 0 97  904 ABCMeta 
    15  197 1  14184 0 98  72 member_descriptor 
    16  188 1  13536 0 98  72 getset_descriptor 
    17  284 2  6816 0 99  24 int 
    18  14 0  5296 0 99 2280 frozenset 
    19  22 0  2288 0 99  104 classobj 

确认结构情况下已被销毁和内存使用率降至2.0MiB。

任何想法我怎么能确保这个类得到正确的垃圾收集?顺便说一下,所有这些都是在Python 2.7.2(Darwin)上执行的。

干杯,

托马斯

+0

你为什么要这样的自我引用?即使你坚持属性访问和项目查找的双重性(恕我直言,根据Python的Zen),还有更好,更简单的方法来实现这一点。 – delnan 2011-12-31 11:53:14

回答

3

您可以更直接地利用__getattr____setattr__,使属性访问到底层的字典实现你的结构类。

class Structure(dict): 
    def __getattr__(self, k): 
     return self[k] 
    def __setattr__(self, k, v): 
     self[k] = v 

周期垃圾收集在Python,但只是周期性(不像得到尽快收集它们的引用计数经常引用计数的对象降到0)。

避免周期(因为使用__getattr____setattr__的Structure类会),意味着您将获得更好的gc行为。你可能想看看collections.namedtuple作为一个很好的选择:它不是完全按照你实现的,但也许它适合你的目的。

+0

嗨保罗,干杯!它看起来是一个很好的选择,我实际上是从这篇文章中读到的:http://ruslanspivak.com/2011/06/12/the-bunch-pattern/。显然垃圾收集的错误也是已知的:http://bugs.python.org/issue1469629关于namedTuple:我很早以前就看过它,但我需要我的数据是可变的。 – 2011-12-31 12:01:27