2012-04-10 59 views
3

我是新来的不仅是Python,但完全说我会很感激你的帮助非常编程!与tweepy流API只返回倒数第二鸣叫,而不是立即最后的鸣叫

我试图筛选检测使用Tweepy Twitter的流API的所有微博。

我已经通过用户ID过滤,已经证实,微博正在收集实时。

无论其,似乎只有倒数第二鸣叫正在收集实时而不是在最新的鸣叫。

你们能帮忙吗?

import tweepy 
import webbrowser 
import time 
import sys 

consumer_key = 'xyz' 
consumer_secret = 'zyx' 


## Getting access key and secret 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth_url = auth.get_authorization_url() 
print 'From your browser, please click AUTHORIZE APP and then copy the unique PIN: ' 
webbrowser.open(auth_url) 
verifier = raw_input('PIN: ').strip() 
auth.get_access_token(verifier) 
access_key = auth.access_token.key 
access_secret = auth.access_token.secret 


## Authorizing account privileges 
auth.set_access_token(access_key, access_secret) 


## Get the local time 
localtime = time.asctime(time.localtime(time.time())) 


## Status changes 
api = tweepy.API(auth) 
api.update_status('It worked - Current time is %s' % localtime) 
print 'It worked - now go check your status!' 


## Filtering the firehose 
user = [] 
print 'Follow tweets from which user ID?' 
handle = raw_input(">") 
user.append(handle) 

keywords = [] 
print 'What keywords do you want to track? Separate with commas.' 
key = raw_input(">") 
keywords.append(key) 

class CustomStreamListener(tweepy.StreamListener): 

    def on_status(self, status): 

     # We'll simply print some values in a tab-delimited format 
     # suitable for capturing to a flat file but you could opt 
     # store them elsewhere, retweet select statuses, etc. 



     try: 
      print "%s\t%s\t%s\t%s" % (status.text, 
             status.author.screen_name, 
             status.created_at, 
             status.source,) 
     except Exception, e: 
      print >> sys.stderr, 'Encountered Exception:', e 
      pass 

    def on_error(self, status_code): 
     print >> sys.stderr, 'Encountered error with status code:', status_code 
     return True # Don't kill the stream 

    def on_timeout(self): 
     print >> sys.stderr, 'Timeout...' 
     return True # Don't kill the stream 

# Create a streaming API and set a timeout value of ??? seconds. 

streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout=None) 

# Optionally filter the statuses you want to track by providing a list 
# of users to "follow". 

print >> sys.stderr, "Filtering public timeline for %s" % keywords 

streaming_api.filter(follow=handle, track=keywords) 

回答

5

我有这个相同的问题。答案并不像在我的情况下运行python一样简单,我认为它也没有解决原始海报的问题。问题实际上出现在一个名为streaming.py和函数_read_loop()的tweepy包的代码中,我认为这个文件需要更新以反映twitter从其api输出数据的格式的变化。

对我来说,解决办法是从GitHub下载最新的代码tweepy,https://github.com/tweepy/tweepy特别是streaming.py文件。您可以查看最近所做的更改,以尝试在此文件的提交历史记录中解决此问题。

我看着tweepy类的细节,并有一个与streaming.py类JSON的鸣叫流中读取的方式的问题。我认为这与twitter更新他们的流API有关,以包含传入状态的位数。长话短说,这里是我在streaming.py中替换的函数来解决这个问题。

def _read_loop(self, resp): 

    while self.running and not resp.isclosed(): 

     # Note: keep-alive newlines might be inserted before each length value. 
     # read until we get a digit... 
     c = '\n' 
     while c == '\n' and self.running and not resp.isclosed(): 
      c = resp.read(1) 
     delimited_string = c 

     # read rest of delimiter length.. 
     d = '' 
     while d != '\n' and self.running and not resp.isclosed(): 
      d = resp.read(1) 
      delimited_string += d 

     try: 
      int_to_read = int(delimited_string) 
      next_status_obj = resp.read(int_to_read) 
      # print 'status_object = %s' % next_status_obj 
      self._data(next_status_obj) 
     except ValueError: 
      pass 

    if resp.isclosed(): 
     self.on_closed(resp) 

该解决方案还需要学习如何下载源代码,tweepy包,修改它,然后安装修改后的库到蟒蛇。这是通过进入您的顶级tweepy目录并输入像sudo setup.py安装取决于您的系统。

我也对github上的编码器给这个包做了评论,让他们知道最新情况。

+3

我已经分叉他们的回购,并把这个修复,只是等待拉请求。暂时,你可以在这里获取固定版本:https://github.com/robbrit/tweepy – robbrit 2012-05-18 13:30:11

+0

@robbrit - 谢谢!我很欣赏这一点。拉完成了吗? – snakesNbronies 2012-06-17 21:26:40

1

这是输出缓冲的情况。用-u(无缓冲)运行python以防止发生这种情况。

或者,您可以强制缓冲区由打印语句后执行sys.stdout.flush()被刷新。

有关更多创意,请参阅this answer

+0

谢谢!我知道这是小事。 – snakesNbronies 2012-04-10 14:20:44