2015-09-27 92 views
0

嗨我有一个名为image.txt的文件,其中包含大约5,00,000图像url.I想读取url并下载图像并将其保存在一个目录中。如果图像不可下载I想要打印异常并重新开始下载其他文件,我可以通过优化的方式来实现。python中的图像下载

import sys 
import os 
import urllib 


def isValidFile(path): 
    if not os.path.isfile(path): 
     print "Path " + path + " doesn't exist! Aborting..." 
     exit(1) 


def isValidDir(path): 
    if not os.path.isdir(path): 
     print "Path " + path + " doesn't exist! Aborting..." 
     exit(1) 


def normalize(url): 
    url = url.split("/")[-1] 
    return url.split("\n")[0] 

# Execution Starts Here 
urls = sys.argv[1] 
isValidFile(urls) 

out_dir = sys.argv[2] 
isValidDir(out_dir) 

with open(urls) as url_array: 
    for url in url_array: 
     urllib.urlretrieve(url, os.path.join(out_dir,  normalize(url))) 

    print("Images Downloaded") 
+2

什么是与你的现有代码的问题?你有错误吗? –

+2

我非常困惑你的代码,以及你正在尝试做什么。你可以放置它并正确缩进(缩进不正确,尤其是在后半部分) – Zizouz212

回答

0

如果你想要一个纯Python的解决方案,你可以试试这个:

import urllib 
import os 

def getImage(url, dest): 
    with open(dest, 'wb') as fh: 
     fh.write(urllib.urlopen(url).read()) 

for url in urlArray: 
    try: 
     getImage(url, os.path.basename(url)) 
    except Exception: 
     print "Error downloading {}".format(url) 
+0

什么是urlArray – rhya

+0

它基本上是一个可迭代的(例如'list'),其中包含要下载的URL。例如'urlArray = ['http://www.somedomain.com/image1.jpg','http://www.anotherdomain.com/image1.jpg']'等等 – jorgeh