我的代码是否泄漏内存（python）？

links_list = char.getLinks(words) 
    for source_url in links_list: 
     try: 
      print 'Downloading URL: ' + source_url 
      urldict = hash_url(source_url) 
      source_url_short = urldict['url_short'] 
      source_url_hash = urldict['url_short_hash'] 
      if Url.objects.filter(source_url_short = source_url_short).count() == 0: 
        try: 
         htmlSource = getSource(source_url) 
        except: 
         htmlSource = '-' 
         print '\thtmlSource got an error...' 
       new_u = Url(source_url = source_url, source_url_short = source_url_short, source_url_hash = source_url_hash, html = htmlSource) 
       new_u.save() 
       time.sleep(3) 
      else: 
       print '\tAlready in database' 
     except: 
      print '\tError with downloading URL..' 
      time.sleep(3) 
      pass 


def getSource(theurl, unicode = 1, moved = 0): 
    if moved == 1: 
     theurl = urllib2.urlopen(theurl).geturl() 
    urlReq = urllib2.Request(theurl) 
    urlReq.add_header('User-Agent',random.choice(agents)) 
    urlResponse = urllib2.urlopen(urlReq) 
    htmlSource = urlResponse.read() 
    htmlSource = htmlSource.decode('utf-8').encode('utf-8') 
    return htmlSource

基本上这个代码的作用是...它需要一个URL列表并下载它们，并将它们保存到数据库中。就这样。我的代码是否泄漏内存（python）？

来源

2009-11-28 TIMEX

是有一个原因你认为你的代码泄漏内存？ – Jehiah 2009-11-28 03:09:42

发生任何错误？或花费太多时间？虽然'htmlSource.decode（'utf-8'）。encode（'utf-8'）'这个技术很奇怪，它的解码来自utf8并且同时编码回utf8。 – YOU 2009-11-28 03:10:45

没有错误发生。但是，我的脚本随机被“杀死”。之前有人建议这是内存泄漏，导致我的内存过载。 – TIMEX 2009-11-28 03:12:14

也许你的过程中使用了太多的内存和服务器（可能是共享的主机）只是杀死它，因为你耗尽你的内存配额。

这里使用一个调用，吃了很多的记忆：

links_list = char.getLinks(words) 
for source_url in links_list: 
    ...

看起来你可能会建设一个记忆整个名单，然后与项目合作。相反，使用迭代器可能会更好，其中一次只能检索一个对象。但是这是一个猜测，因为很难从你的代码中知道char.getLinks是什么

如果你在调试模式下使用Django，那么内存的使用将会上升，正如Mark所建议的那样。

来源

2009-11-28 03:42:54 Evgeny

如果你在Django中这样做，确保DEBUG是False，否则它会缓存每个查询。

See FAQ

来源

2009-11-28 03:29:21

检查最简单的方法是转到任务管理器（在Windows上 - 或其他平台上的等效项），并检查Python进程的内存需求。如果保持不变，则不存在内存泄漏。如果不是这样，你就有内存泄漏的地方，你需要调试

来源

2009-11-28 03:49:14 inspectorG4dget

也许你应该得到一个工作服务器，如beanstalkd，并考虑一次只做一个。

作业服务器将重新请求失败的作业服务器，从而完成剩下的工作。如果您需要（甚至在多台计算机上），也可以同时运行多个客户端。

设计简单，易于理解和测试，更多的容错，重试，更具可扩展性，等等

来源

2009-11-28 04:30:08 Dustin

我的代码是否泄漏内存（python）？

回答

相关问题