HTTP请求超载/超时使用python

我有一个python脚本运行，它基本上要求1000个URL超过http并记录他们的响应。这是下载url页面的功能。HTTP请求超载/超时使用python

def downld_url(url, output): 
    print "Entered Downld_url and scraping the pdf/doc/docx file now..." 
    global error 
    try: 
     # determine all extensions we should account for 
     f = urllib2.urlopen(url) 
     data = f.read() 
     dlfn = urlsplit(url).path.split('.')[-1] 
     print "The extension of the file is: " + str(dlfn) 
     dwnladfn = ImageDestinationPath + "/" + output + "." + dlfn 
     with open(dwnladfn, "wb") as code: 
      code.write(data) 
      code.close() 
     _Save_image_to_s3(output+"."+dlfn, ImageDestinationPath + "/" +output + 
          "." + dlfn) 
     print dlfn + " file saved to S3" 
     os.remove(ImageDestinationPath + "/" +output + "." + dlfn) 
     print dlfn + "file removed from local folder" 
     update_database(output,output+"."+dlfn, None) 
     return 
    except Exception as e: 
     error = "download error: " + str(e) 
     print "Error in downloading file: " + error 
     return

现在的反应开始变得很慢，最终的反应只是超时这个运行平稳在100-200管道，但网址后。我猜，这是因为请求超载。有没有一些有效的方式来做到这一点，而不会超载请求？

来源

2014-03-07 Scooby

注意：这些网址大多是.png和.pdf文件，下载。 – Scooby

不相关：使用'urlparse'解析url和'os.path'，'posixpath'来操纵路径 – jfs

哪些请求变慢：是urlopen（），还是_Save_image_to_s3或update_database（）或者是其他东西？ – jfs

我不知道问题来自哪里，但如果它涉及在同一过程中请求过多，则可尝试multiprocessing作为解决方法。

它也可能加快整个过程，因为您可以同时执行多个任务（例如，一个进程下载，另一个正在写入磁盘，...）。我做了一个类似的事情，它真的更好（增加总下载速度太）

来源

2014-03-07 17:02:25

HTTP请求超载/超时使用python

回答

相关问题