2016-02-28 332 views

回答

4

尚未通过公共API获取。

3

我发现的唯一方法是使用像Seleniuim这样的浏览器自动化(使用某些处理格式的逻辑,例如5.6k视图和1,046视图)系统地刮取帖子的固定链接,并挑选出合适的元素。由于缺少javascript检测,简单的GET请求不会产生所需的DOM。

在蟒蛇:

from bs4 import BeautifulSoup 
from selenium import webdriver 

def insertViews(posts): 
    driver = webdriver.PhantomJS('<path-to-phantomjs-driver-ignoring-escapes>') 
    views_span_dom_path = '._9jphp > span' 

    for post in posts: 
     post_type = post.get('Type') 
     link = post.get('Link') 
     views = post.get('Views') 

     if post_type == 'video': 
      driver.get(link) 
      html = driver.page_source 

      soup = BeautifulSoup(html, "lxml") 
      views_string_results = soup.select(views_span_dom_path) 
      if len(views_string_results) > 0: 
       views_string = views_string_results[0].get_text() 
      if 'k' in views_string: 
       views = float(views_string.replace('k', '')) * 1000 
      elif ',' in views_string: 
       views = float(views_string.replace(',', '')) 
      elif 'k' not in views_string and ',' not in views_string: 
       views = float(views_string) 
     else: 
      views = None 

     post['Views'] = views 
    driver.quit() 
    return posts 

的PhantomJS驱动程序可以下载here

相关问题