2011-05-21 94 views
1

我有什么定期刷新使用这个脚本的页面:内存泄漏而循环web.client.getPage功能

from twisted.web.client import getPage 
from twisted.internet import reactor, task 

def getData(): 
    dgp = getPage('http://www.google.com/') 
    dgp.addCallback(dataLoadOK) 
    dgp.addErrback(dataLoadError) 

def dataLoadOK(value): 
    print value 

def dataLoadError(error): 
    print error 

loop = task.LoopingCall(getData) 
loop.start(10, now=True) 
reactor.run() 

购买,而使用这种方式,我得到了内存泄漏。有没有人帮我找到它?

编辑: 我已经尝试使用garbage collection python module,并得到了这一点的说:

GARBAGE OBJECTS: 
:: <HTTPClientFactory: http://www.google.com/> 
     type: <type 'instance'> 
referrers: 3 
    is class: True 
    module: <module 'twisted.web.client' from '/usr/lib/python2.7/site-packages/twisted/web/client.pyc'> 

:: {'status': '200', 'cookies': {'PREF': 'ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI', 'NID': '47=LxM9fbBBN-bVIeuLPOfvO-fgXOKw1n2suyZ2... 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

:: InsensitiveDict({}) 
     type: <type 'instance'> 
referrers: 3 
    is class: True 
    module: <module 'twisted.python.util' from '/usr/lib/python2.7/site-packages/twisted/python/util.pyc'> 

:: {'preserve': 1, 'data': {}} 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

:: <Deferred at 0x29e2cf8 current result: None> 
     type: <type 'instance'> 
referrers: 3 
    is class: True 
    module: <module 'twisted.internet.defer' from '/usr/lib/python2.7/site-packages/twisted/internet/defer.pyc'> 

:: {'_chainedTo': None, 'called': True, '_canceller': None, 'callbacks': [], 'result': None, '_runningCallbacks': False} 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

:: <<class 'twisted.internet.tcp.Client'> to ('www.google.com', 80) at 2445090> 
     type: <class 'twisted.internet.tcp.Client'> 
referrers: 3 
    is class: True 
    module: <module 'twisted.internet.tcp' from '/usr/lib/python2.7/site-packages/twisted/internet/tcp.pyc'> 
    line num: 681 
     line: class Client(BaseClient): 
     line:  """A TCP client.""" 
     line: 
     line:  def __init__(self, host, port, bindAddress, connector, reactor=None): 
     line:   # BaseClient.__init__ is invoked later 
     line:   self.connector = connector 
     line:   self.addr = (host, port) 
     line: 
     line:   whenDone = self.resolveAddress 
     line:   err = None 
     line:   skt = None 
     line: 
     line:   try: 
     line:    skt = self.createInternetSocket() 
     line:   except socket.error, se: 
     line:    err = error.ConnectBindError(se[0], se[1]) 
     line:    whenDone = None 
     line:   if whenDone and bindAddress is not None: 
     line:    try: 
     line:     skt.bind(bindAddress) 
     line:    except socket.error, se: 
     line:     err = error.ConnectBindError(se[0], se[1]) 
     line:     whenDone = None 
     line:   self._finishInit(whenDone, skt, err, reactor) 
     line: 
     line:  def getHost(self): 
     line:   """Returns an IPv4Address. 
     line: 
     line:   This indicates the address from which I am connecting. 
     line:   """ 
     line:   return address.IPv4Address('TCP', *(self.socket.getsockname() + ('INET',))) 
     line: 
     line:  def getPeer(self): 
     line:   """Returns an IPv4Address. 
     line: 
     line:   This indicates the address that I am connected to. 
     line:   """ 
     line:   return address.IPv4Address('TCP', *(self.realAddress + ('INET',))) 
     line: 
     line:  def __repr__(self): 
     line:   s = '<%s to %s at %x>' % (self.__class__, self.addr, unsignedID(self)) 
     line:   return s 

:: {'_tempDataBuffer': [], 'disconnected': 1, 'dataBuffer': '', '_tempDataLen': 0, 'realAddress': ('74.125.225.81', 80), 'connector': <twisted.internet.tcp.Connect... 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

:: [] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: {'x-xss-protection': ['1; mode=block'], 'set-cookie': ['PREF=ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI; expires=Tue, 21-May-2013 0... 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

:: ['-1'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: ['private, max-age=0'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: ['text/html; charset=ISO-8859-1'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: ['PREF=ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI; expires=Tue, 21-May-2013 08:34:12 GMT; path=/; domain=.google.com', 'NID=47=LxM9... 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: ['gws'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: ['1; mode=block'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: [] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: <twisted.internet.tcp.Connector instance at 0x29e2cb0> 
     type: <type 'instance'> 
referrers: 3 
    is class: True 
    module: <module 'twisted.internet.tcp' from '/usr/lib/python2.7/site-packages/twisted/internet/tcp.pyc'> 

:: ['Sun, 22 May 2011 08:34:12 GMT'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: {'reactor': <twisted.internet.selectreactor.SelectReactor object at 0x288bd10>, 'state': 'disconnected', 'factoryStarted': 0, 'bindAddress': None, 'factory': <H... 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

所以我看到扭曲的功能内的一些未关闭的参考,我怎么能避免呢?

回答

3

尝试在related questions推荐的一些策略。但是,很可能您没有内存泄漏,您只有memory fragmentation

它看起来像“Python内存泄漏检测器”有一个非常严重的错误。它启用DEBUG_LEAK,其中防止收集所有周期。换句话说,它创造了大量的大量泄漏。如果您只是在示例中添加一些代码来报告gc.garbage的内容而未启用DEBUG_LEAK,则它将保持为空(即使没有启用任何gc调试标志,如果有任何对象实际上正在泄漏,将会填充gc.garbage)。

+0

只是更新我的帖子结果狩猎泄漏,垃圾对象每次增加getData()运行 – BGE 2011-05-22 08:50:45

+0

更新了答案,谈论“Python内存泄漏检测器”的缺陷。 – 2011-05-22 13:41:46

2

您安排循环呼叫的方式可能是一个问题。您不会从getData返回Deferred,因此通话可能会累积。

如果检索您的网页花费的时间超过10秒,则会在第二个getData完成之前调用第二个getData。如果你使用的是一个试图扼杀你的网站(并且google.com肯定会这样做),那么越多的请求堆积起来,它就会越耽误你。每次尝试都会占用一些内存,这可能看起来像是泄漏。

如果是这样的问题(虽然你应该使用让 - 保罗暗示发现,如果这是实际上问题的技术),那么你可以通过添加“return dgp”你getData函数的最后解决。

+0

实际上在生产脚本中,间隔是300秒,比任何超时多,我检查预调用getData()调用完成,为了更好的阅读,此脚本被简化 – BGE 2011-05-22 08:53:58