2016-10-03 95 views
0

我想分析twitter数据。我已经下载了一些推文并将它们保存在一个.txt文件中。将推文保存到python字典

当我试图提取从微博数据有用的信息,我没能取得任何进展,因为对于初学者和我一样,似乎很难提取微博,位置等

,而谷歌上搜索,我发现如果我们将json转换为字典可以很容易地提取信息。

现在我想将我的JSON数据转换为python字典。我不知道如何继续。

这里是用来保存鸣叫

import tweepy 
import json 
import jsonpickle 

consumer_key = "*********" 
consumer_secret = "*******" 

access_token = "************" 
access_token_secret = "**********" 

auth = tweepy.AppAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret) 


# It make the Tweepy API call auto wait (sleep) when it hits the rate limit and continue upon expiry of the window. 
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True) 

if (not api): 
    print ("Can't Authenticate") 
    sys.exit(-1) 


searchQuery = 'SomeHashtag' 
maxTweets = 10000000 # Some arbitrary large number 
tweetsPerQry = 100 
fName = 'file.txt' 

sinceId = None 
max_id = "Latest tweet ID" 

tweetCount = 0 
print("Downloading max {0} tweets".format(maxTweets)) 
with open(fName, 'a') as f: 

    while tweetCount < maxTweets: 
     try: 
      if (max_id <= 0): 
       if (not sinceId): 
        new_tweets = api.search(q=searchQuery, lang ="en", count=tweetsPerQry) 

       else: 
        new_tweets = api.search(q=searchQuery, lang ="en", count=tweetsPerQry, 
             since_id=sinceId) 
      else: 
       if (not sinceId): 
        new_tweets = api.search(q=searchQuery, lang ="en", count=tweetsPerQry, 
             max_id=str(max_id - 1)) 
       else: 
        new_tweets = api.search(q=searchQuery, lang ="en", count=tweetsPerQry, 
             max_id=str(max_id - 1), 
             since_id=sinceId) 

      if not new_tweets: 
       print("No more tweets found") 
       break 
      for tweet in new_tweets: 
       f.write(jsonpickle.encode(tweet._json, unpicklable=False) + '\n') 

      tweetCount += len(new_tweets) 
      print("Downloaded {0} tweets".format(tweetCount)) 
      max_id = new_tweets[-1].id 
     except tweepy.TweepError as e: 
      # Just exit if any error 
      print("some error : " + str(e)) 
      break 

    print ("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fName)) 
+1

您的.txt文件的外观如何? –

+0

我编辑了你的问题的语法。请检查它是否清楚。请添加请求的信息:txt文件的内容和足够的代码,以便测试。 –

回答

0

代码看来你可以只用一行读你的文件中的行,并使用jsonpickle.decode方法它unpickle:

tweets = [] 
with open(filename) as f: 
    for line in f: 
     tweets.append(jsonpickle.decode(line)) 

而且我认为你可以绕过第三方库:

import json 
with open(filename, 'w') as f: 
    for tweet in new_tweets: 
     f.write(json.dumps(tweet) + '\n') 

tweets = [] 
with open(filename) as f: 
    for line in f: 
     tweets.append(json.loads(line)) 
+0

当我试图用json.dumps()而不是jsonpickle下载推文时,在错误窗口中我得到了一些tweet数据以及错误“不是JSON可序列化的” – Khurshid