2016-04-30 89 views
0

我试图重复存储在一个JSON文件Twitter的数据:JSONDecodeError当迭代叽叽喳喳数据

fname = 'test.json' 

with open(fname, 'r') as f: 
    for line in f: 
     tweet = json.loads(line)['text'] 
     print(tweet) 

它打印出文件中的第一推得很好,但,当它遍历第二次它给我一个JSONDecodeError:

JSONDecodeError: Expecting value: line 2 column 1 (char 1) 

我的JSON文件是650Mb的大小大约。

要获取Twitter数据,我使用了来自Twitter API的StreamListener

这里是一个窥见到我的JSON文件:

{"created_at":"Sun Apr 24 05:37:02 +0000 2016","id":724109877732204544,"id_str":"724109877732204544","text":"JONES RETURNS WITH A UNANIMOUS DECISION WIN IVER OVINCE SAINT PREUX! #UFC197 https:\/\/t.co\/KlfaAh9h21","source":"\u003ca href=\"http:\/\/instagram.com\" rel=\"nofollow\"\u003eInstagram\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":714389668633116672,"id_str":"714389668633116672","name":"Leon Doyle","screen_name":"TheLDPodcast","location":"Dublin, Ireland","url":"http:\/\/www.youtube.com","description":"A weekly\/bi-weekly podcast focused mainly around MMA, Boxing, fighting etc. With the occasional random topic.","protected":false,"verified":false,"followers_count":7,"friends_count":59,"listed_count":0,"favourites_count":3,"statuses_count":31,"created_at":"Mon Mar 28 09:52:24 +0000 2016","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"004455","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/714390864030797824\/REXXKCvs_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/714390864030797824\/REXXKCvs_normal.jpg","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"UFC197","indices":[69,76]}],"urls":[{"url":"https:\/\/t.co\/KlfaAh9h21","expanded_url":"https:\/\/www.instagram.com\/p\/BEkk6Gewpqy\/","display_url":"instagram.com\/p\/BEkk6Gewpqy\/","indices":[77,100]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1461476222819"} 

{"created_at":"Sun Apr 24 05:37:03 +0000 2016","id":724109879200366592,"id_str":"724109879200366592","text":"regrann from @ufc - #AndStill UFC flyweight champ @MightyMouseUFC! #UFC197\n\nPresented by\u2026 https:\/\/t.co\/zbE5CsFxMJ","source":"\u003ca href=\"http:\/\/instagram.com\" rel=\"nofollow\"\u003eInstagram\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":1070221260,"id_str":"1070221260","name":"Will Manuel","screen_name":"TheWillManuel","location":"Kenai, AK","url":null,"description":"Alaskan. Paramedic. Firefighter. Industrial Security. Libertarian. 2nd Amendment. Liberty. BJJ & Muay Thai novice. #TeamRed #RedemptionMMA #BJJ #MuayThai #MMA","protected":false,"verified":false,"followers_count":437,"friends_count":573,"listed_count":32,"favourites_count":2516,"statuses_count":3184,"created_at":"Tue Jan 08 07:22:47 +0000 2013","utc_offset":-28800,"time_zone":"Alaska","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/579042288040435713\/VeA-zI45.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/579042288040435713\/VeA-zI45.jpeg","profile_background_tile":true,"profile_link_color":"4A913C","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/715188796615237632\/JvxeLz8D_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/715188796615237632\/JvxeLz8D_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1070221260\/1447179132","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"AndStill","indices":[22,31]},{"text":"UFC197","indices":[69,76]}],"urls":[{"url":"https:\/\/t.co\/zbE5CsFxMJ","expanded_url":"https:\/\/www.instagram.com\/p\/BEkk6a0QMeX\/","display_url":"instagram.com\/p\/BEkk6a0QMeX\/","indices":[92,115]}],"user_mentions":[{"screen_name":"ufc","name":"#UFC197","id":6446742,"id_str":"6446742","indices":[13,17]},{"screen_name":"MightyMouseUFC","name":"Demetrious Johnson","id":140845817,"id_str":"140845817","indices":[52,67]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1461476223169"} 

{"created_at":"Sun Apr 24 05:37:03 +0000 2016","id":724109882341896192,"id_str":"724109882341896192","text":"RT @BESTFlGHTS: Jon Jones flips off Daniel Cormier at #UFC197 https:\/\/t.co\/S0pDvRWhfW","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":1019191860,"id_str":"1019191860","name":"Paul","screen_name":"Paulie_Frat","location":"Mount Pocono, PA","url":null,"description":"...","protected":false,"verified":false,"followers_count":272,"friends_count":259,"listed_count":0,"favourites_count":1580,"statuses_count":1622,"created_at":"Tue Dec 18 07:10:12 +0000 2012","utc_offset":-14400,"time_zone":"Eastern Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"131516","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme14\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme14\/bg.gif","profile_background_tile":true,"profile_link_color":"009999","profile_sidebar_border_color":"EEEEEE","profile_sidebar_fill_color":"EFEFEF","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/512140999444164608\/4H2fiOtg_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/512140999444164608\/4H2fiOtg_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1019191860\/1461422809","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Sun Apr 24 05:12:13 +0000 2016","id":724103630702432256,"id_str":"724103630702432256","text":"Jon Jones flips off Daniel Cormier at #UFC197 https:\/\/t.co\/S0pDvRWhfW","source":"\u003ca href=\"http:\/\/bufferapp.com\" rel=\"nofollow\"\u003eBuffer\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":1370712786,"id_str":"1370712786","name":"BEST FIGHTS","screen_name":"BESTFlGHTS","location":"MMA, Boxing, Street Fights","url":"http:\/\/snapchat.com\/add\/wshhfans","description":"Parody, we do not own the content posted   DM's are open send me your fight","protected":false,"verified":false,"followers_count":156257,"friends_count":17861,"listed_count":83,"favourites_count":1,"statuses_count":6723,"created_at":"Sun Apr 21 22:43:19 +0000 2013","utc_offset":-25200,"time_zone":"Arizona","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"131516","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme14\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme14\/bg.gif","profile_background_tile":true,"profile_link_color":"ABB8C2","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"EFEFEF","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/620356388833734657\/NvmkmGDk_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/620356388833734657\/NvmkmGDk_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1370712786\/1460756748","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":740,"favorite_count":624,"entities":{"hashtags":[{"text":"UFC197","indices":[38,45]}],"urls":[{"url":"https:\/\/t.co\/S0pDvRWhfW","expanded_url":"http:\/\/vine.co\/v\/iU5T53X6U7J","display_url":"vine.co\/v\/iU5T53X6U7J","indices":[46,69]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"UFC197","indices":[54,61]}],"urls":[{"url":"https:\/\/t.co\/S0pDvRWhfW","expanded_url":"http:\/\/vine.co\/v\/iU5T53X6U7J","display_url":"vine.co\/v\/iU5T53X6U7J","indices":[62,85]}],"user_mentions":[{"screen_name":"BESTFlGHTS","name":"BEST FIGHTS","id":1370712786,"id_str":"1370712786","indices":[3,14]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1461476223918"} 

我怎样才能解决这个问题呢?

回答

0

如果您的JSON文件与您发布的作品具有完全相同的结构,推文之间的空行确实会导致JSONDecodeError。如果这是问题,请在处理之前检查该行是否为空:

In [12]: 

with open(fname, 'r') as f: 
    for line in f: 
     if (not line.strip()): 
      continue 
     tweet = json.loads(line)['text'] 
     print(tweet) 

希望它有帮助。

+0

谢谢,它的工作! – Sourdough