2017-05-03 114 views
0

我用下面的代码初学者:打印“井号标签”从JSON到CSV文件使用Python

f = open("singletweetwithtimezone.json", "r") 
tweet_text = f.read() 

import json 

tweet_json = json.loads(tweet_text) 

g = open("Singletweetcsvoutput.csv", "w") 

g.write(tweet_json["created_at"]+"\t") 
g.write(tweet_json["user"]["time_zone"]+"\t") 
g.write(tweet_json["entities"]["hashtags"]["text"]) 

g.close() 
f.close() 

写作的作品,除了井号标签。我希望它能够在CSV文件中编写文本'messi',但由于缺乏知识,我无法弄清楚我做错了什么。我得到以下错误:

g.write(tweet_json["entities"]["hashtags"]["text"]) 
TypeError: list indices must be integers, not str". " 

的JSON树图所示我加: what the tree looks like

任何人谁可以帮我吗?

RAW JSON代码:

{"created_at":"Sun Apr 23 21:04:13 +0000 2017","id":856252394233106432,"id_str":"856252394233106432","text":"RT @11FC_FR: Et \u00e0 la fin ... #Messi \n\ud83d\ude0d https:\/\/t.co\/uiyTnJJiKd","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":1628829458,"id_str":"1628829458","name":"LK20","screen_name":"KeevinGruny","location":null,"url":null,"description":"\u26bd\ufe0f\u26bd\ufe0f","protected":false,"verified":false,"followers_count":903,"friends_count":209,"listed_count":39,"favourites_count":7328,"statuses_count":59480,"created_at":"Sun Jul 28 21:50:55 +0000 2013","utc_offset":10800,"time_zone":"Athens","geo_enabled":true,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/854838667428466688\/jE52U_LU_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/854838667428466688\/jE52U_LU_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1628829458\/1492857072","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Sun Apr 23 20:38:37 +0000 2017","id":856245950544838660,"id_str":"856245950544838660","text":"Et \u00e0 la fin ... #Messi \n\ud83d\ude0d https:\/\/t.co\/uiyTnJJiKd","display_text_range":[0,25],"source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2617563403,"id_str":"2617563403","name":"11FootballClub","screen_name":"11FC_FR","location":"3, All\u00e9e Cassard - NANTES","url":"http:\/\/www.11footballclub.com","description":"11FootballClub est un concept store unique et une boutique en ligne soign\u00e9e. Actus foot, nouveaut\u00e9s produits, promos et jeux concours","protected":false,"verified":false,"followers_count":49693,"friends_count":24561,"listed_count":54,"favourites_count":350,"statuses_count":1268,"created_at":"Fri Jul 11 15:21:20 +0000 2014","utc_offset":-25200,"time_zone":"Pacific Time (US & Canada)","geo_enabled":true,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"A16E1E","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/648580532322836480\/2lodFucd_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/648580532322836480\/2lodFucd_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/2617563403\/1443468740","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":155,"favorite_count":92,"entities":{"hashtags":[{"text":"Messi","indices":[16,22]}],"urls":[],"user_mentions":[],"symbols":[],"media":[{"id":856245937915793408,"id_str":"856245937915793408","indices":[26,49],"media_url":"http:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","url":"https:\/\/t.co\/uiyTnJJiKd","display_url":"pic.twitter.com\/uiyTnJJiKd","expanded_url":"https:\/\/twitter.com\/11FC_FR\/status\/856245950544838660\/photo\/1","type":"photo","sizes":{"small":{"w":680,"h":460,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":1200,"h":812,"resize":"fit"},"large":{"w":2048,"h":1386,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":856245937915793408,"id_str":"856245937915793408","indices":[26,49],"media_url":"http:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","url":"https:\/\/t.co\/uiyTnJJiKd","display_url":"pic.twitter.com\/uiyTnJJiKd","expanded_url":"https:\/\/twitter.com\/11FC_FR\/status\/856245950544838660\/photo\/1","type":"photo","sizes":{"small":{"w":680,"h":460,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":1200,"h":812,"resize":"fit"},"large":{"w":2048,"h":1386,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"fr"},"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"Messi","indices":[29,35]}],"urls":[],"user_mentions":[{"screen_name":"11FC_FR","name":"11FootballClub","id":2617563403,"id_str":"2617563403","indices":[3,11]}],"symbols":[],"media":[{"id":856245937915793408,"id_str":"856245937915793408","indices":[39,62],"media_url":"http:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","url":"https:\/\/t.co\/uiyTnJJiKd","display_url":"pic.twitter.com\/uiyTnJJiKd","expanded_url":"https:\/\/twitter.com\/11FC_FR\/status\/856245950544838660\/photo\/1","type":"photo","sizes":{"small":{"w":680,"h":460,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":1200,"h":812,"resize":"fit"},"large":{"w":2048,"h":1386,"resize":"fit"}},"source_status_id":856245950544838660,"source_status_id_str":"856245950544838660","source_user_id":2617563403,"source_user_id_str":"2617563403"}]},"extended_entities":{"media":[{"id":856245937915793408,"id_str":"856245937915793408","indices":[39,62],"media_url":"http:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C-H_JNnXkAApCFh.jpg","url":"https:\/\/t.co\/uiyTnJJiKd","display_url":"pic.twitter.com\/uiyTnJJiKd","expanded_url":"https:\/\/twitter.com\/11FC_FR\/status\/856245950544838660\/photo\/1","type":"photo","sizes":{"small":{"w":680,"h":460,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":1200,"h":812,"resize":"fit"},"large":{"w":2048,"h":1386,"resize":"fit"}},"source_status_id":856245950544838660,"source_status_id_str":"856245950544838660","source_user_id":2617563403,"source_user_id_str":"2617563403"}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"fr","timestamp_ms":"1492981453842"} 
+0

检查更新后的答案并进行测试。 – ciacicode

回答

0

开始通过印刷在响应中的每个不同的节点的类型。您看到的错误是由于您尝试访问响应中的所有内容而造成的,因为它是字典的关键。的事情是,主题标签,从截图,可能是阵列,因此需要作为待接入:

tweet_json['entities']['hashtags'][0]['text'] 

主题标签包含一个数组,在这种情况下,阵列的长度为1,因此你与访问[ 0],但由于这些数组的长度是可变的,你应该添加一个长度检查,然后做一个如下所示的循环。我喜欢使用来自csv library的dictwriter,即使这是一个过度杀戮,它可以用来通过多个推文。

import csv 
import json 


with open('.../input.json','r') as inputfile: 
    tweet= inputfile.read() 

tweet_json = json.loads(tweet) 

with open('.../output.csv', 'w') as csvfile: 
    fieldnames = ['created_at', 'user', 'hashtags'] 
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 
    writer.writeheader() 
    #extract all info that you want to write 
    created_at = tweet_json['created_at'] 
    #selecting the screen_name of the user rather than id 
    user = tweet_json['user']['screen_name'] 
    hashtags = tweet_json['entities']['hashtags'] 
    #creating an empty string for the hashtags in the array 
    hashes = list() 
    for hashtag in hashtags: 
     text = hashtag['text'] 
     #append to hashes listed_count 
     hashes.append(text) 
    #stringify the list and write to file (will be ugly) 
    writer.writerow({"created_at":created_at, "user":user,"hashtags":str(hashes) }) 
+0

谢谢,这里是原始代码,添加到问题中。 – MaroYKW

+0

感谢您的更新!这将如何使完整的代码看起来像,我不能得到它的工作:')。 – MaroYKW

+0

@MaroYKW检查一下 – ciacicode