2017-10-15 325 views
1

我正尝试将json文件转换为csv文件。 json文件来自tweepy。无法用python编写csv文件

import json 
import csv 

fo = open('Sclass.json', 'r') 
fw = open('Hasil_Tweets.csv', 'a') 

for line in fo: 
     try: 
       tweet = json.loads(line) 
       fw.write(tweet['id'],tweet['timestamp_ms'],tweet['user']['name'],tweet['user']['statuses_count'],tweet['user']['friends_count'],tweet['user']['followers_count'],tweet['place']['bounding_box']['coordinates'],tweet['text']+"\n") 
     except: 
       continue 

但是,当我打印它的作品。 当我只写fw.write(tweet['text'])它的作品。

感谢

雅呵,我的小白上既没有蟒蛇和tweepy。但我的直觉说,这个问题与它自己的json文件有关。对不起我的英语不好。 这是JSON文件,它的自我:

{ 
    "created_at": "Wed Oct 11 08:36:21 +0000 2017", 
    "id": 918032510927355904, 
    "id_str": "918032510927355904", 
    "text": "@irfanzayo @puisisi @tasyak Lo tuh kebiasaan overthinking \ud83d\ude24", 
    "display_text_range": [ 
     28, 
     59 
    ], 
    "source": "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>", 
    "truncated": false, 
    "in_reply_to_status_id": 918032029094047746, 
    "in_reply_to_status_id_str": "918032029094047746", 
    "in_reply_to_user_id": 60049976, 
    "in_reply_to_user_id_str": "60049976", 
    "in_reply_to_screen_name": "irfanzayo", 
    "user": { 
     "id": 59980455, 
     "id_str": "59980455", 
     "name": "Mutiara Sisyanni D", 
     "screen_name": "MutiaraSisyanni", 
     "location": "Jakarta, Indonesia", 
     "url": "http://mutiarasyn.wixsite.com/mutiarasisyanni", 
     "description": null, 
     "translator_type": "none", 
     "protected": false, 
     "verified": false, 
     "followers_count": 354, 
     "friends_count": 237, 
     "listed_count": 1, 
     "favourites_count": 326, 
     "statuses_count": 6507, 
     "created_at": "Sat Jul 25 04:31:47 +0000 2009", 
     "utc_offset": 25200, 
     "time_zone": "Jakarta", 
     "geo_enabled": true, 
     "lang": "en", 
     "contributors_enabled": false, 
     "is_translator": false, 
     "profile_background_color": "FA8C9E", 
     "profile_background_image_url": "http://abs.twimg.com/images/themes/theme5/bg.gif", 
     "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme5/bg.gif", 
     "profile_background_tile": false, 
     "profile_link_color": "FF8A94", 
     "profile_sidebar_border_color": "FFFFFF", 
     "profile_sidebar_fill_color": "99CC33", 
     "profile_text_color": "3E4415", 
     "profile_use_background_image": false, 
     "profile_image_url": "http://pbs.twimg.com/profile_images/486497248293826560/FANdzhL9_normal.jpeg", 
     "profile_image_url_https": "https://pbs.twimg.com/profile_images/486497248293826560/FANdzhL9_normal.jpeg", 
     "profile_banner_url": "https://pbs.twimg.com/profile_banners/59980455/1404826066", 
     "default_profile": false, 
     "default_profile_image": false, 
     "following": null, 
     "follow_request_sent": null, 
     "notifications": null 
    }, 
    "geo": null, 
    "coordinates": null, 
    "place": { 
     "id": "66555622726ab358", 
     "url": "https://api.twitter.com/1.1/geo/id/66555622726ab358.json", 
     "place_type": "city", 
     "name": "Setia Budi", 
     "full_name": "Setia Budi, Indonesia", 
     "country_code": "ID", 
     "country": "Indonesia", 
     "bounding_box": { 
      "type": "Polygon", 
      "coordinates": [ 
       [ 
        [ 
         106.817351, 
         -6.24152 
        ], 
        [ 
         106.817351, 
         -6.201177 
        ], 
        [ 
         106.852353, 
         -6.201177 
        ], 
        [ 
         106.852353, 
         -6.24152 
        ] 
       ] 
      ] 
     }, 
     "attributes": {} 
    }, 
    "contributors": null, 
    "is_quote_status": false, 
    "quote_count": 0, 
    "reply_count": 0, 
    "retweet_count": 0, 
    "favorite_count": 0, 
    "entities": { 
     "hashtags": [], 
     "urls": [], 
     "user_mentions": [ 
      { 
       "screen_name": "irfanzayo", 
       "name": "irfan zayanto", 
       "id": 60049976, 
       "id_str": "60049976", 
       "indices": [ 
        0, 
        10 
       ] 
      }, 
      { 
       "screen_name": "puisisi", 
       "name": "Puisi Pancara", 
       "id": 32809069, 
       "id_str": "32809069", 
       "indices": [ 
        11, 
        19 
       ] 
      }, 
      { 
       "screen_name": "tasyak", 
       "name": "Tasya Kurnia", 
       "id": 41986880, 
       "id_str": "41986880", 
       "indices": [ 
        20, 
        27 
       ] 
      } 
     ], 
     "symbols": [] 
    }, 
    "favorited": false, 
    "retweeted": false, 
    "filter_level": "low", 
    "lang": "in", 
    "timestamp_ms": "1507710981481" 
} 

另一个错误

Traceback (most recent call last): File "C:\Users\User\Desktop\fase 1-20170930T062552Z-001\transformCSV.py", line 7, in tweet = json.loads(line) File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\json__init__.py", line 354, in loads return _default_decoder.decode(s) File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

Traceback (most recent call last): 
    File "C:\Users\Tanabata\Desktop\Putang ina mo\spli.py", line 8, in <module> 
    tweet = json.load(fo) 
    File "C:\Users\Tanabata\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 299, in load 
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) 
    File "C:\Users\Tanabata\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 354, in loads 
    return _default_decoder.decode(s) 
    File "C:\Users\Tanabata\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py", line 342, in decode 
    raise JSONDecodeError("Extra data", s, end) 
json.decoder.JSONDecodeError: Extra data: line 3 column 1 (char 2893) 

JSON文件itselft:http://www.mediafire.com/file/l3rzzbe0nbu1nlu/Sclass.json

回答

0

不要使用csv。你必须创建一个writer

import json 
import csv 

with open('Sclass.json', 'r') as fo, open('Hasil_Tweets.csv', 'a') as fw: 
    writer = csv.writer(fw) 
    for line in fo: 
     tweet = json.loads(line) 
     writer.writerow([tweet['id'],tweet['timestamp_ms'],tweet['user']['name'], 
      tweet['user']['statuses_count'],tweet['user']['friends_count'], 
      tweet['user']['followers_count'], 
      tweet['place']['bounding_box']['coordinates'],tweet['text']]) 

关于第二个问题,现在看来,你没有一个JSON线文件但一个JSON数据集的文件。一旦

with open('Sclass.json', 'r') as fo: 
    tweet = json.load(fo) 

with open('Hasil_Tweets.csv', 'a') as fw 
    writer = csv.writer(fw) 
    writer.writerow([tweet['id'],tweet['timestamp_ms'],tweet['user']['name'], 
     tweet['user']['statuses_count'],tweet['user']['friends_count'], 
     tweet['user']['followers_count'], 
     tweet['place']['bounding_box']['coordinates'],tweet['text']]) 
+0

谢谢先生,但我得到了另一个错误:AttributeError:'_csv.writer'对象没有属性'写' – Tanabata

+0

对不起,应该是'writerow'。答案已更正。 – Daniel

+0

谢谢先生,但另一个错误:作家()只需要一个参数(给定8)sry打扰你先生。 – Tanabata

0

因为你是与表工作(CSV是一个)认为大熊猫(我认为):所以逐行读取线是不对的,你应该阅读文件作为一个整体。

在这种情况下,我们可以使用pandas json_normalize来解释您的json文件。

import json 
from pandas.io.json import json_normalize 

with open("Sclass.json.json") as f: 
    df = json_normalize(json.load(f)) 

cols = ["id","timestamp_ms","user.name", 
     "user.statuses_count","user.friends_count","user.followers_count", 
     "place.bounding_box.coordinates","text"] 

df[cols].to_csv("Hasil_Tweets.csv",sep=",",index=False) # outputs to csv 

熊猫配备了许多输出选项,其中之一是一个HTML表格。我会用它来显示outut:

print(df[cols].to_html(index=False)) # outputs to html to show result 

输出

<table border="1" class="dataframe"> 
 
    <thead> 
 
    <tr style="text-align: right;"> 
 
     <th>id</th> 
 
     <th>timestamp_ms</th> 
 
     <th>user.name</th> 
 
     <th>user.statuses_count</th> 
 
     <th>user.friends_count</th> 
 
     <th>user.followers_count</th> 
 
     <th>place.bounding_box.coordinates</th> 
 
     <th>text</th> 
 
    </tr> 
 
    </thead> 
 
    <tbody> 
 
    <tr> 
 
     <td>918032510927355904</td> 
 
     <td>1507710981481</td> 
 
     <td>Mutiara Sisyanni D</td> 
 
     <td>6507</td> 
 
     <td>237</td> 
 
     <td>354</td> 
 
     <td>[[[106.817351, -6.24152], [106.817351, -6.2011...</td> 
 
     <td>@irfanzayo @puisisi @tasyak Lo tuh kebiasaan o...</td> 
 
    </tr> 
 
    </tbody> 
 
</table>

+0

谢谢先生。但我在这里得到一些问题,1.raise JSONDecodeError(“额外的数据”,S,结束)2.json.decoder.JSONDecodeError:额外的数据:第3行第1列(字符2893)最新的错误呢?当你能够提取数据和IAM不>。< – Tanabata

+0

@Tanabata嗯..我把你发布的文本,并创建一个具有相同名称的JSON文件。你有更多的数据可能吗?我通常使用jsonlint.com来验证是否有错误。 –

+0

谢谢先生我已尝试验证JSON,结果是很多TING不正确,我尝试从其他json文件= https://nocodewebscraping.com/twitter-json-examples/女巫有JSON文件,并且他们给完全相同的错误:引发JSONDecodeError(“Extra data”,s,end) json.decoder.JSONDecodeError:额外数据:第89行第2列(char 3438)在线不同是行,列和图表。哦,我发布我的JSON文件上面的问题。谢谢先生 – Tanabata

0

我加入这是另一种答案。

您共享的* .json实际上是一个包含多个json字符串的大文件,但每隔两行。你如何从一开始这个文件,我不知道,但你可以使用此阅读:

import json 
import pandas as pd 

with open("Sclass.json") as f: 
    data = [json.loads(row.strip()) for row in f.readlines()[0::2]] 

然而,阅读本结构的数据帧时,你可以看到它真的是没有任何清晰的结构:

pd.DataFrame(data) 

结论:你的问题完全是另一回事。

+0

谢谢先生。所以JSON文件没有很好的结构先生?以及如果你不介意你能纠正我的streamlistener,以便我可以得到正确的json文件或csv文件这是我的conde:def on_data(self,data): 尝试: 与开放('tescsv.json',' A')为f: f.write(数据) 返回TRUE 除了BaseException为e: 打印( “错误on_data:%S” %STR(e))的 返回TRUE DEF ON_ERROR(个体,状态) : print(status) return True – Tanabata