2017-04-05 154 views
1

我想一个JSON文件转换成大熊猫DF删除不需要的数据和限制ID的数据的CSV看起来是这样的:JSON文件大熊猫DF

{ 
    "data": [ 
    { 
     "message": "Uneeded message", 
     "created_time": "2017-04-02T17:20:37+0000", 
     "id": "723456782912449_1008262099345654" 
    }, 
    { 
     "message": "Uneeded message", 
     "created_time": "2017-03-28T06:26:28+0000", 
     "id": "771345678912449_1003934567871010" 
    }, 

我没有用JSON之前,但我用加载该数据的代码是

import pandas as pd 
import json 

with open('fileName.json', encoding="utf8") as f: 
    w = json.loads(f.read(), strict=False) 

最终输出应该仅仅是一个CSV用的ID的

回答

2

一个专栏中,我认为你需要json_normalize

from pandas.io.json import json_normalize 
import json 

with open('file.json') as data_file:  
    d = json.load(data_file) 

print (d) 
{ 
    "data": [{ 
     "message": "Uneeded message", 
     "created_time": "2017-04-02T17:20:37+0000", 
     "id": "723456782912449_1008262099345654" 
    }, { 
     "message": "Uneeded message", 
     "created_time": "2017-03-28T06:26:28+0000", 
     "id": "771345678912449_1003934567871010" 
    }] 
} 

df = json_normalize(d, 'data') 
print (df) 
       created_time        id   message 
0 2017-04-02T17:20:37+0000 723456782912449_1008262099345654 Uneeded message 
1 2017-03-28T06:26:28+0000 771345678912449_1003934567871010 Uneeded message 
+0

这个工作很好,谢谢! – J3319

1

json.loads使用

设置

json_str = """{ 
"data": [ 
     { 
      "message": "Uneeded message", 
      "created_time": "2017-04-02T17:20:37+0000", 
      "id": "723456782912449_1008262099345654" 
     }, 
     { 
      "message": "Uneeded message", 
      "created_time": "2017-03-28T06:26:28+0000", 
      "id": "771345678912449_1003934567871010" 
     }]}""" 

溶液

import json 
import pandas as pd 

pd.DataFrame(json.loads(json_str)['data']) 

       created_time        id   message 
0 2017-04-02T17:20:37+0000 723456782912449_1008262099345654 Uneeded message 
1 2017-03-28T06:26:28+0000 771345678912449_1003934567871010 Uneeded message 

或与文件中的json相关联

with open('neutraluk1.json') as f: 
    print(pd.DataFrame(json.load(f)['data'])) 

       created_time        id   message 
0 2017-04-02T17:20:37+0000 723456782912449_1008262099345654 Uneeded message 
1 2017-03-28T06:26:28+0000 771345678912449_1003934567871010 Uneeded message