BeautifulSoup：抓取嵌入式href链接列表

我正在处理关于某些最近趋势视频的信息https://www.youtube.com/feed/trending。我将页面加载到BeautifulSoup中，但在尝试运行需要解析的div列表时遇到错误。BeautifulSoup：抓取嵌入式href链接列表

import urllib2 
from bs4 import BeautifulSoup 

url = 'https://www.youtube.com/feed/trending' 
page = urllib2.urlopen(url) 
soup = BeautifulSoup(page,'html.parser') 

#narrow in to divs with relevant meta-data 
videos = soup.find_all('div',class_='yt-lockup-content') 
videos[50].div.a['href'] #checking one specific DIV 
>>u'user/nameofchannel' #works

到现在为止我回来了，我需要的信息，但是当我试图通过所有div（此页上写的70+）的运行，我得到相关的数据类型，此方法返回一个错误。

for v in videos: 
    videos[v].div.a['href'] 
>> TypeError: list indices must be integers, not Tag

我如何通过DIV列表运行“视频”所返回，并打印出匹配值的列表“视频[N] .div.a [” href“属性]

来源

2017-02-11 James

for v in range(len(videos)): 
    videos[v].div.a['href']

你需要的是videos列表的索引，而不是它的标记。

更好：

for index, value in enumerate(videos): 
    videos[index].div.a['href']

很多更好：

[v.div.a['href'] for v in videos]

使用列表理解建议这样的任务

来源

2017-02-11 08:12:26

谢谢！列表理解格式起作用，但第一个没有。错误：“TypeError：'int'对象不可迭代” – James

BeautifulSoup：抓取嵌入式href链接列表

回答

相关问题