0
我有一个python脚本运行,它基本上要求1000个URL超过http并记录他们的响应。这是下载url页面的功能。HTTP请求超载/超时使用python
def downld_url(url, output):
print "Entered Downld_url and scraping the pdf/doc/docx file now..."
global error
try:
# determine all extensions we should account for
f = urllib2.urlopen(url)
data = f.read()
dlfn = urlsplit(url).path.split('.')[-1]
print "The extension of the file is: " + str(dlfn)
dwnladfn = ImageDestinationPath + "/" + output + "." + dlfn
with open(dwnladfn, "wb") as code:
code.write(data)
code.close()
_Save_image_to_s3(output+"."+dlfn, ImageDestinationPath + "/" +output +
"." + dlfn)
print dlfn + " file saved to S3"
os.remove(ImageDestinationPath + "/" +output + "." + dlfn)
print dlfn + "file removed from local folder"
update_database(output,output+"."+dlfn, None)
return
except Exception as e:
error = "download error: " + str(e)
print "Error in downloading file: " + error
return
现在的反应开始变得很慢,最终的反应只是超时这个运行平稳在100-200管道,但网址后。我猜,这是因为请求超载。有没有一些有效的方式来做到这一点,而不会超载请求?
注意:这些网址大多是.png和.pdf文件,下载。 – Scooby
不相关:使用'urlparse'解析url和'os.path','posixpath'来操纵路径 – jfs
哪些请求变慢:是urlopen(),还是_Save_image_to_s3或update_database()或者是其他东西? – jfs