从网页下载图片使用python

我想写一个python脚本，从webpage.on网页下载图片（我正在使用美国国家航空航天局的当天页面图片），每天发布一张新图片，以不同的文件名。从网页下载图片使用python

所以我的解决方案是使用HTMLParser解析HTML，查找“jpg”，并将图像的路径和文件名写入HTML解析器对象的属性（命名为“output”，请参阅下面的代码）。

我是python和OOP的新手（这是我的第一个真正的python脚本），所以我不确定这是如何通常完成的。任何建议和指针是受欢迎的。

这里是我的代码：

# Grab image url 
response = urllib2.urlopen('http://apod.nasa.gov/apod/astropix.html') 
html = response.read() 

class MyHTMLParser(HTMLParser): 
def handle_starttag(self, tag, attrs): 
    # Only parse the 'anchor' tag. 
    if tag == "a": 
     # Check the list of defined attributes. 
     for name, value in attrs: 
      # If href is defined, print it. 
      if name == "href": 
       if value[len(value)-3:len(value)]=="jpg": 
        #print value 
        self.output=value #return the path+file name of the image 

parser = MyHTMLParser() 
parser.feed(html) 
imgurl='http://apod.nasa.gov/apod/'+parser.output

来源

2013-03-11 Cici

是什么问题？ – piokuc 2013-03-11 23:19:51

如果您的代码正在运行，并且您只想对可能的改进方法发表评论，则可以咨询Code Review的优秀人员：http://codereview.stackexchange.com/ – bernie 2013-03-11 23:25:32

...我不知道codereview是否存在...感谢 – Cici 2013-03-11 23:32:50

要检查一个字符串是否与"jpg"结束，你可以使用.endswith()代替len()和切片：

if name == "href" and value.endswith("jpg"): 
    self.output = value

如果网页中搜索更加复杂，您可以使用lxml.html或BeautifulSoup而不是HTMLParser例如：

from lxml import html 

# download & parse web page 
doc = html.parse('http://apod.nasa.gov/apod/astropix.html').getroot() 

# find <a href that ends with ".jpg" and 
# that has <img child that has src attribute that also ends with ".jpg" 
for elem, attribute, link, _ in doc.iterlinks(): 
    if (attribute == 'href' and elem.tag == 'a' and link.endswith('.jpg') and 
     len(elem) > 0 and elem[0].tag == 'img' and 
     elem[0].get('src', '').endswith('.jpg')): 
     print(link)

来源

2013-03-12 00:32:41 jfs

从网页下载图片使用python

回答

相关问题