2017-10-15 68 views
1

我在一个大JSON我想转换成CSV作进一步分析其工作。 由于我使用json_normalize建表时,得到如下错误:JSON与正常化大熊猫 - 列表索引必须是int

Traceback (most recent call last):

File "/Users/Home/Downloads/JSONtoCSV/easybill.py", line 30, in "status", "text", "text_prefix", "title", "type", "use_shipping_address", "vat_option"

File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py", line 248, in json_normalize _recursive_extract(data, record_path, {}, level=0)

File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py", line 235, in _recursive_extract meta_val = _pull_field(obj, val[level:])

File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py", line 169, in _pull_field result = result[field]

TypeError: list indices must be integers, not str

在第一步中,我做了很多的测试与小/更降低了JSON的代码验证。现在,当我为完整的JSON组装所有内容时,我收到了此错误消息。

我该如何解决这个问题?我试图实现与大熊猫正常化像所示:http://pandas.pydata.org/pandas-docs/stable/io.html#normalization

这是我到目前为止的代码。感谢您的帮助!

编辑:这是JSON来源:https://pastebin.com/muGBPWv8

# -*- coding: utf-8 -*- 
import pandas 
import json 
import sys 
reload(sys) 
sys.setdefaultencoding('utf-8') 
from pandas.io.json import json_normalize 

# Paths 
json_file_path = "/Users/Home/Downloads/JSONtoCSV/JSON-Files/Seite0.json" 
csv_file_path = "/Users/Home/Downloads/JSONtoCSV/CSV-files/Seite0.csv" 
node = "items" 

# JSON file open, no pagination information 
with open(json_file_path) as f: 
    rawjson = json.load(f) 
data = rawjson[node] 

# remove "number" because it causes errors in pandas. 
good_data = eval(repr(data).replace("number", "numbr")) 

# normalization 
norm_data = json_normalize(good_data, "items", [ 
["address","city"], ["address","company_name"], ["address","country"], ["address","first_name"], ["address","last_name"], ["address","personal"], ["address","salutation"], ["address","street"], ["address","suffix_1"], ["address","suffix_2"], ["address","title"], ["address","zip_code"], 
"amount", "amount_net", "attachment_ids", "bank_debit_form", "cancel_id", "cash_allowance", "cash_allowance_days", "cash_allowance_text", "contact_id", "contact_label", "contact_text", "created_at", "currency", "customer_id", "discount", "discount_type", "document_date", "due_date", "edited_at", "external_id", "grace_period", "id", "is_archive", "is_draft", "is_replica", 
["items","booking_account"], ["items","cost_price_charge"], ["items","cost_price_charge_type"], ["items","cost_price_net"], ["items","cost_price_total"], ["items","description"], ["items","discount"], ["items","discount_type"], ["items","export_cost_1"], ["items","export_cost_2"], ["items","id"], ["items","numbr"], ["items","position"], ["items","position_id"], ["items","quantity"], ["items","quantity_str"], ["items","serial_number"], ["items","serial_number_id"], ["items","single_price_gross"], ["items","single_price_net"], ["items","total_price_gross"], ["items","total_price_net"], ["items","total_vat"], ["items","type"], ["items","unit"], ["items","vat_percent"], 
"label_address", "label_address", "login_id", "numbr", "paid_amount", "paid_at", "pdf_pages", "pdf_template", "project_id", "ref_id", "replica_url", 
["service_date","type"], ["service_date","date"], ["service_date","date_from"], ["service_date","date_to"], ["service_date","text"], 
"status", "text", "text_prefix", "title", "type", "use_shipping_address", "vat_option" 
]) 

# save to csv 
norm_data.to_csv(csv_file_path, sep=";") 

回答

0

我看到sevaral问题与您的代码:

  1. 您有冲突的ID的元数据。例如,你有'id'作为元数据(级别1项),也'id'为您'items'的元素。这可以通过给第三个参数被解析为json_normalize,像

    json_normalize(good_data, “项目”,[...], “元”。

  2. json_normalize预期的元数据存储在字典(可能是,字典,递归),但你有list类型的值项,例如attachment_ids,似乎目前json_normalize无法处理它们。

  3. 而且,似乎json_normalize不能处理空类型的字典,像"label_address": {}

  4. 最后,你可能不需要线["items","booking_account"], ["items","cost_price_charge"], ...json_normalize你的第三个(元数据)的说法,与这样的路径元素已经为您的数据检索(即由于第二个参数json_normalize)。

考虑到与json_normalize的问题,我不希望使用你的问题,但只是写下简单的命令式代码(带循环/列表理解),从创建一个表(名单列表)你的JSON,然后从该表创建pandas数据帧。

+0

嗨 - 感谢您发表评论!我用'只有json_normalize',因为我需要一个平坦的桌面标准化,结果像如下所示:http://pandas.pydata.org/pandas-docs/stable/io.html#normalization “里克斯科特”是决赛桌多次 - 但在源json中只有一次。 你有任何提示如何达到? 我完全是新的python - 所以它是相当喧嚣通过:( – thowi