2017-06-17 198 views
0

我无法找到Imgur相册中的所有链接。BeautifulSoup在Imgur上找到所有图片链接

下面是从imgur的HTML:

<div class="post-image">... 
<a href="//i.imgur.com/P1VMco8.png" class="zoom"><img src="//i.imgur.com/P1VMco8.png" alt="" itemprop="contentURL" /> 

如何提取网页中唯一的HREF?我使用下面的代码获取所有内容。

with urllib.request.urlopen('https://imgur.com/a/OmD1E') as f: 
    r = f.read() 
    soup = BeautifulSoup(r,'lxml') 
    result = soup.select(".post-image a") 

回答

1

下面的代码打印所有图片链接:

import urllib 
from bs4 import BeautifulSoup 
with urllib.request.urlopen('https://imgur.com/a/OmD1E') as f: 
    soup = BeautifulSoup(f.read(),'lxml') 
for image in soup.select(".post-image"): 
    print(image.a["href"]) 

如果你正在寻找只有第一.post-image然后做

import urllib 
from bs4 import BeautifulSoup 
with urllib.request.urlopen('https://imgur.com/a/OmD1E') as f: 
    soup = BeautifulSoup(f.read(),'lxml') 
print(soup.select(".post-image")[0].a["href"]) 
+0

谢谢。稍作修改,现在我有一个图像url的列表 – DatCra