2017-07-26 210 views
2

对此有问题。我不知道如何去展示一个img。例如:获取图像网址显示单个图像名称

<img srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 390w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 458w" src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg"> 

正如你在上面看到的,有不同的替代图像,但我试图刮一个单一的显示。

import bs4 as bs 
import urllib.request 
import datetime 
import random 
import re 


random.seed(datetime.datetime.now()) 

sauce = urllib.request.urlopen('http://www.manchestereveningnews.co.uk/news/greater-manchester-news').read() 
soup = bs.BeautifulSoup(sauce, 'lxml') 

# 




title = soup.title 
link = soup.link 
image = re.search(img 'srcset=img(.*?),) 
#this doesnt work, not sure how to 

strong = soup.strong 
description = soup.description 
location = soup.location 


title = soup.find('h1', class_ ='publication-font',) 

image = soup.find('img') 
strong = soup.find('strong') 
location = soup.find('em').find('a') 
description = soup.find('div', class_='description',to.text) 


#Previous Code 
print("H1:", title.text) 
print("Article Link:", link) 
print("Image Url:\n", image) 
print("1st Paragraph:\n", strong.text) 
print("2nd Paragraph:\n", description.string) 
print("Location:\n", location.text) 

我的代码是上面的,但是前面的结果,当我以前的尝试都将展示:

Greater Manchester News 
<link href="rss.xml" rel="alternate" title="Default home feed" 

type="application/rss+xml"/> 

<img data-`src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNA`TES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg" data-`srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTE`RNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w,` http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALT`ERNATES/s 

390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-`Trafford-home-last-Thursday.jpg 390w, `http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-t`he-attack-outs`ide-his- 

Trafford-home-last-Thursday.jpg 458w"/> 
     Family of dad stabbed in the neck while defendin 

g his fiancée from thugs speak of their heartbreak 
     Mike Grimshaw, 34, died after being stabbed in the neck outside his 

home in Trafford last Thursday 

Trafford 

在结果中,显示多个图像的名字,但我试图只显示一个图像链接。我如何去做这件事。

任何想法将不胜感激。

回答

0

您可以访问属性data-srcdata-srcset以获取图像,你想:

image = soup.find('img') 
single_img = image.get('data-src') # return the main image link 

import re 
image = soup.find('img') 
img_string = image.get('data-srcset') # this return a string you have to parse 
img_set = re.findall(r'(https?://[^\s]+)', img_set) # regex to match only links 

然后你就可以访问你想在img_set无论指数(只是测试的长度之前的清单)