2013-02-26 106 views
6

我想获取CNN评论系统评论的所有评论。 举个例子,http://edition.cnn.com/2013/02/25/tech/innovation/google-glass-privacy-andrew-keen/index.html?hpt=hp_c1如何从Disqus获取所有评论?

评论系统要求我们点击“加载更多”,以便我们可以看到更多评论。 我已经尝试使用PHP来解析HTML,但它无法加载所有评论,因为使用JavaScript。 所以我想知道如果有人有一个更方便的方式来检索所有来自特定的cnn网址的评论。

有没有人成功了? 在此先感谢

回答

6

Disqus API包含使用在JSON响应中返回的游标的分页方法。看到这里有关游标的信息:http://disqus.com/api/docs/cursors/

既然你提到的PHP,这样的事情应该让你开始:

<?php 
$apikey = '<your key here>'; // get keys at http://disqus.com/api/ — can be public or secret for this endpoint 
$shortname = '<the disqus forum shortname>'; // defined in the var disqus_shortname = '...'; 
$thread = 'link:<URL of thread>'; // IMPORTANT the URL that you're viewing isn't necessarily the one stored with the thread of comments 
//$thread = 'ident:<identifier of thread>'; Use this if 'link:' has no results. Defined in 'var disqus_identifier = '...'; 
$limit = '100'; // max is 100 for this endpoint. 25 is default 

$endpoint = 'https://disqus.com/api/3.0/threads/listPosts.json?api_key='.$apikey.'&forum='.$shortname.'&limit='.$limit.'&cursor='.$cursor; 

$j=0; 
listcomments($endpoint,$cursor,$j); 

function listcomments($endpoint,$cursor,$j) { 

    // Standard CURL 
    $session = curl_init($endpoint.$cursor); 
    curl_setopt($session, CURLOPT_RETURNTRANSFER, 1); // instead of just returning true on success, return the result on success 
    $data = curl_exec($session); 
    curl_close($session); 

    // Decode JSON data 
    $results = json_decode($data); 
    if ($results === NULL) die('Error parsing json'); 

    // Comment response 
    $comments = $results->response; 

    // Cursor for pagination 
    $cursor = $results->cursor; 

    $i=0; 
    foreach ($comments as $comment) { 
     $name = $comment->author->name; 
     $comment = $comment->message; 
     $created = $comment->createdAt; 
     // Get more data... 

     echo "<p>".$name." wrote:<br/>"; 
     echo $comment."<br/>"; 
     echo $created."</p>"; 
     $i++; 
    } 

    // cursor through until today 
    if ($i == 100) { 
     $cursor = $cursor->next; 
     $i = 0; 
     listcomments($endpoint,$cursor); 
     /* uncomment to only run $j number of iterations 
     $j++; 
     if ($j < 10) { 
      listcomments($endpoint,$cursor,$j); 
     }*/ 
    } 
} 

?> 
+0

非常感谢!但是,我们需要准确地为$ thread(线程的URL)和$ cursor?顺便说一句,我们最多只能有100条评论吗? – 2013-02-26 09:15:23

+0

线程的URL只是注释页面的URL。在这种情况下,它是http://www.cnn.com/2013/02/25/tech/innovation/google-glass-privacy-andrew-keen/index.html - 游标值从API响应中提取,并且代表下一组100条评论。该脚本将一直持续到没有其他评论为止。 – 2013-02-27 05:39:14

+0

我将$ shortname设置为'cnn'(var disqus_shortname ='cnn';)和$ thread''链接:'并保持$ cursor为空,但事实证明“Error parsing json”。我想念什么? – 2013-02-27 07:11:14

3

只是一个加法:拿到disqus评论的URL的网页上,它的发现,在Web浏览器控制台运行这段JavaScript代码:

var visit = function() { 
var url = document.querySelector('div#disqus_thread iframe').src; 

String.prototype.startsWith = function (check) { 
    return(this.indexOf(check) == 0); 
}; 

if (!url.startsWith('https://')) return url.slice(0, 4) + "s" + url.slice(4); 

return url; 
}(); 

自变量现在是在 '参观'

console.log(visit); 

我帮你把所有的数据都转换成UTF-8 json格式,保存成.txt格式,可以在这里找到它link。 json格式包含一些变量名称,但您需要的是'data'变量,它是一个JavaScript数组。

遍历其中的每一个,然后将它们拆分为'x == x'。 'x == x'是为了确保那些在捕获的地方发表评论的人的用户名。在数字格式中没有用户标识而只有名称的情况下,这意味着该帐户不再处于活动状态。

要使用用户ID,它是一个https://disqus.com/users/106222183无论在哪里的是用户ID

-1

没有API:

#disqus_thread { 
    position: relative; 
    height: 300px; 
    background-color: #fff; 
    overflow: hidden; 
} 
#disqus_thread:after { 
    content: ""; 
    display: block; 
    height: 10px; 
    width: 100%; 
    position: absolute; 
    bottom: 0; 
    background: white; 
} 
#disqus_thread.loaded { 
    height: auto; 
} 
#disqus_thread.loaded:after{ 
    height:55px; 
} 
#disqus-load { 
    text-align: center; 
    color: #fff; 
    padding: 11px 14px; 
    font-size: 13px; 
    font-weight: 500; 
    display: block; 
    text-align: center; 
    border: none; 
    background: rgba(29,47,58,.6); 
    line-height: 1.1; 
    border-radius: 3px; 
    font-weight: 500; 
    transition: background .2s; 
    text-shadow: none; 
    cursor:pointer; 
} 

<div class="disqus-comments"> 
    <div id='disqus_thread'></div> 
    <div id='disqus-load'>Load comments</div> 
</div> 

<script type="text/javascript"> 


$(document).ready(function() { 
    var disqus_shortname = 'testare-123'; 

    (function() { 
     var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; 
     dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; 
     (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); 
    })(); 
     $('#disqus-load').on('click', function(){ 

     $.ajax({ 
      type: "GET", 
      url: "http://" + disqus_shortname + ".disqus.com/embed.js", 
      dataType: "script", 
      cache: true 
     }); 

     $(this).fadeOut(); 
     $('#disqus_thread').addClass('loaded'); 
    }); 
}); 
    /* * * CONFIGURATION VARIABLES * * */ 
    // var disqus_shortname = 'testare-123'; 

    // /* * * DON'T EDIT BELOW THIS LINE * * */ 
    // (function() { 
    // var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; 
    // dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; 
    // (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); 
    // })(); 
</script> 
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript" rel="nofollow">comments powered by Disqus.</a></noscript>