如何使用Python的gdata模块获取所有YouTube评论？

试图抓住给定视频中的所有评论，而不是一次只浏览一页。如何使用Python的gdata模块获取所有YouTube评论？

from gdata import youtube as yt 
from gdata.youtube import service as yts 

client = yts.YouTubeService() 
client.ClientLogin(username, pwd) #the pwd might need to be application specific fyi 

comments = client.GetYouTubeVideoComments(video_id='the_id') 
a_comment = comments.entry[0]

与让你抢单的评论，可能是最近的评论上面的代码，但是我正在寻找一种方式来一次抢所有的意见。这可能与Python的gdata模块？

YouTube的API文档comments，评论饲料docs和Python的API docs

来源

2012-10-10 TankorSmash

这回答了[这里]（http://stackoverflow.com/questions/10941803/using-youtube-api-to-get-all-comments-from-a-video-with-the-json-feed）使用PHP的解决方案，因为YouTube PHP API有一个允许它的调用。我不认为纯Python的答案就在那里。 –

@KenB我也看到了。这太遗憾了。有问题的视频有9k条评论，我不认为制作360'GetNextLink'是最好的方法。 – TankorSmash

“www.youtube.com/all_comments？v = video_id”网址有一个可解析的评论列表，但这是一个很长的加载时间。假设我可以尝试。 – TankorSmash

后下达到你的要求使用Python YouTube API：

from gdata.youtube import service 

USERNAME = '[email protected]' 
PASSWORD = 'a_very_long_password' 
VIDEO_ID = 'wf_IIbT8HGk' 

def comments_generator(client, video_id): 
    comment_feed = client.GetYouTubeVideoCommentFeed(video_id=video_id) 
    while comment_feed is not None: 
     for comment in comment_feed.entry: 
      yield comment 
     next_link = comment_feed.GetNextLink() 
     if next_link is None: 
      comment_feed = None 
     else: 
      comment_feed = client.GetYouTubeVideoCommentFeed(next_link.href) 

client = service.YouTubeService() 
client.ClientLogin(USERNAME, PASSWORD) 

for comment in comments_generator(client, VIDEO_ID): 
    author_name = comment.author[0].name.text 
    text = comment.content.text 
    print("{}: {}".format(author_name, text))

不幸的是，API限制了可检索到条目的数量。这就是我，当我尝试了微调的版本，用一只手的错误制作GetYouTubeVideoCommentFeed URL参数：

gdata.service.RequestError: {'status': 400, 'body': 'You cannot request beyond item 1000.', 'reason': 'Bad Request'}

注意，同样的原则应适用于检索API的其他供稿条目。

如果你想手工工艺GetYouTubeVideoCommentFeed URL参数，它的格式是：

'https://gdata.youtube.com/feeds/api/videos/{video_id}/comments?start-index={sta‌rt_index}&max-results={max_results}'

以下限制：start-index <= 1000和max-results <= 50。

来源

2012-10-10 20:38:16

太棒了。你知道是否有办法手动设置'start_index'或'items_per_page'？将它设置在第一组评论上似乎没有任何作用。 – TankorSmash

您只需将以下格式的网址传递给'GetYouTubeVideoCommentFeed'：'https://gdata.youtube.com/feeds/api/videos/{video_id}/comments?start-index={start_index}&max-results = {} MAX_RESULTS'。以下限制适用：'start-index <= 1000'和'max-results <= 50'。 –

太棒了，甚至没有想到改变URI，欢呼！ – TankorSmash

我有现在唯一的解决办法，但它不使用API，并得到缓慢的有几千当注释。

import bs4, re, urllib2 
#grab the page source for vide 
data = urllib2.urlopen(r'http://www.youtube.com/all_comments?v=video_id') #example XhFtHW4YB7M 
#pull out comments 
soup = bs4.BeautifulSoup(data) 
cmnts = soup.findAll(attrs={'class': 'comment yt-tile-default'}) 
#do something with them, ie count them 
print len(cmnts)

注意的是，由于“阶级”是一个内置的Python名字，你无法通过正则表达式或lambda表达式做“startwith”常规搜索所看到here，由于您使用的字典，在常规参数。由于BeautifulSoup，它也变得很慢，但它需要被使用，因为etree和minidom由于某种原因找不到匹配的标签。即使prettyfying()与bs4

来源

2012-10-10 20:24:50 TankorSmash

嗨，感兴趣的答案，但我认为，HTML结构已经改变。你是否使用替代标签而不是'comment yt-tile-default'？谢谢！ – Thoth

@Thoth我一段时间都没有使用过这个，但是打开开发工具并编辑我的答案，如果你发现 – TankorSmash

如何使用Python的gdata模块获取所有YouTube评论？

回答

相关问题