2016-12-16 200 views
1

我知道这个问题已被多次询问。我尝试了几种解决方案,但是我无法解决我的问题。在Python中将嵌套的JSON转换为CSV文件

我有一个大的嵌套JSON文件(1.4GB),我想使它变平,然后将其转换为CSV文件。

的JSON结构是这样的:

{ 
    "company_number": "12345678", 
    "data": { 
    "address": { 
     "address_line_1": "Address 1", 
     "locality": "Henley-On-Thames", 
     "postal_code": "RG9 1DP", 
     "premises": "161", 
     "region": "Oxfordshire" 
    }, 
    "country_of_residence": "England", 
    "date_of_birth": { 
     "month": 2, 
     "year": 1977 
    }, 
    "etag": "26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00", 
    "kind": "individual-person-with-significant-control", 
    "links": { 
     "self": "/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl" 
    }, 
    "name": "John M Smith", 
    "name_elements": { 
     "forename": "John", 
     "middle_name": "M", 
     "surname": "Smith", 
     "title": "Mrs" 
    }, 
    "nationality": "Vietnamese", 
    "natures_of_control": [ 
     "ownership-of-shares-50-to-75-percent" 
    ], 
    "notified_on": "2016-04-06" 
    } 
} 

我知道,这是很容易与pandas模块来完成,但我不熟悉它。

EDITED

所需的输出应该是这样的:

company_number, address_line_1, locality, country_of_residence, kind, 

12345678, Address 1, Henley-On-Thamed, England, individual-person-with-significant-control 

注意,这仅仅是一个短版。输出应该包含所有的字段。

+0

你能显示所需的输出吗? – zipa

+0

我编辑了我的帖子 – Porjaz

+0

首先你必须自己解决这个错误..但我没有得到错误,并且json加载正常 – Matthias

回答

1

你可以通过解析JSON结构,只是返回所有的叶子节点列表如下做到这一点:

import json 
import csv 

def get_leaves(item, key=None): 
    if isinstance(item, dict): 
     leaves = [] 
     for i in item.keys(): 
      leaves.extend(get_leaves(item[i], i)) 
     return leaves 
    elif isinstance(item, list): 
     leaves = [] 
     for i in item: 
      leaves.extend(get_leaves(i, key)) 
     return leaves 
    else: 
     return [(key, item)] 


with open('json.txt') as f_input, open('output.csv', 'wb') as f_output: 
    csv_output = csv.writer(f_output) 
    write_header = True 

    for entry in json.load(f_input): 
     leaf_entries = sorted(get_leaves(entry)) 

     if write_header: 
      csv_output.writerow([k for k, v in leaf_entries]) 
      write_header = False 

     csv_output.writerow([v for k, v in leaf_entries]) 

如果你的JSON数据是你给的格式条目列表,然后你应该得到的输出如下:

address_line_1,company_number,country_of_residence,etag,forename,kind,locality,middle_name,month,name,nationality,natures_of_control,notified_on,postal_code,premises,region,self,surname,title,year 
Address 1,12345678,England,26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00,John,individual-person-with-significant-control,Henley-On-Thames,M,2,John M Smith,Vietnamese,ownership-of-shares-50-to-75-percent,2016-04-06,RG9 1DP,161,Oxfordshire,/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl,Smith,Mrs,1977 
Address 1,12345679,England,26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00,John,individual-person-with-significant-control,Henley-On-Thames,M,2,John M Smith,Vietnamese,ownership-of-shares-50-to-75-percent,2016-04-06,RG9 1DP,161,Oxfordshire,/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl,Smith,Mrs,1977 

注:如果您使用Python 3.x中,更改以下行:

with open('json.txt', newline='') as f_input, open('output.csv', 'w', newline='') as f_output: 
+0

我认为这可能会导致问题,如果嵌套的键值在整个json文件不一致。如果其中一个结构缺少一个字段,则该行中的数据将被偏移。 –

+0

此代码无法用于我的json数据。我只能解析这个键:“K6v8Ht6nXCjaO_ApNGr”你能帮我解释一下吗?请。我的Python版本是3.6.4 – tpbafk

+0

@tpbafk,对于Python 3.x,你需要对'open()'命令做一个小改动(我已经更新了脚本),但是没有看到你的JSON,我不会能够告诉你它不解析所有内容的原因。也许你应该开始一个新的问题? –