2017-10-07 85 views
0

我正在学习美丽的汤,并试图抓取从本地目录上传的图像时遇到问题。我看到的错误是:刮去本地加载的图像

ValueError: unknown url type: 'images/ixa2.png' 

我认为正在发生的事情是图像从本地目录中加载并没有被通过URL托管。这是个什么样子,当我检查,我试图刮元素,如:

<img width="200" align="left" hspace="0" src="ixa/cards/axisofmortality.jpg"> 

我很好奇,如果有可能刮掉这些图像,如果是这样,怎么样。

这里是我一起工作的代码:你正试图从残缺的网址下载图像

from urllib import request 
import urllib.request 
from bs4 import BeautifulSoup as soup 

def make_soup(url): 
    result = request.urlopen(url) 
    page = result.read() 

    parsed_page = soup(page, "html.parser") 
    result.close() 
    return parsed_page 

def get_images(url): 
    soup = make_soup(url) 
    images = [img for img in soup.findAll('img')] 
    print (str(len(images)) + "images found.") 
    print('Downloading images to current working directory.') 
    #compile our unicode list of image links 
    image_links = [each.get('src') for each in images] 
    for each in image_links: 
     filename=each.split('/')[-1] 
     urllib.request.urlretrieve(each, filename) 
    return image_links 

get_images('http://mythicspoiler.com/') 

回答

1

我的建议是这样的:

def get_images(url): 
    soup = make_soup(url) 
    images = [img for img in soup.findAll('img')] 
    print (str(len(images)) + "images found.") 
    print('Downloading images to current working directory.') 
    #compile our unicode list of image links 
    image_links = [each.get('src') for each in images] 
    for each in image_links: 
     filename=each.split('/')[-1] 
     urllib.request.urlretrieve('http://mythicspoiler.com/' + each, filename) # <--- 
    return image_links 
+0

简单而有效的,谢谢。 – Bonteq