JSON与正常化大熊猫 - 列表索引必须是int

我在一个大JSON我想转换成CSV作进一步分析其工作。由于我使用json_normalize建表时，得到如下错误：JSON与正常化大熊猫 - 列表索引必须是int

Traceback (most recent call last):

File "/Users/Home/Downloads/JSONtoCSV/easybill.py", line 30, in "status", "text", "text_prefix", "title", "type", "use_shipping_address", "vat_option"

File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py", line 248, in json_normalize _recursive_extract(data, record_path, {}, level=0)

File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py", line 235, in _recursive_extract meta_val = _pull_field(obj, val[level:])

File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py", line 169, in _pull_field result = result[field]

TypeError: list indices must be integers, not str

在第一步中，我做了很多的测试与小/更降低了JSON的代码验证。现在，当我为完整的JSON组装所有内容时，我收到了此错误消息。

我该如何解决这个问题？我试图实现与大熊猫正常化像所示：http://pandas.pydata.org/pandas-docs/stable/io.html#normalization

这是我到目前为止的代码。感谢您的帮助！

编辑：这是JSON来源：https://pastebin.com/muGBPWv8

# -*- coding: utf-8 -*- 
import pandas 
import json 
import sys 
reload(sys) 
sys.setdefaultencoding('utf-8') 
from pandas.io.json import json_normalize 

# Paths 
json_file_path = "/Users/Home/Downloads/JSONtoCSV/JSON-Files/Seite0.json" 
csv_file_path = "/Users/Home/Downloads/JSONtoCSV/CSV-files/Seite0.csv" 
node = "items" 

# JSON file open, no pagination information 
with open(json_file_path) as f: 
    rawjson = json.load(f) 
data = rawjson[node] 

# remove "number" because it causes errors in pandas. 
good_data = eval(repr(data).replace("number", "numbr")) 

# normalization 
norm_data = json_normalize(good_data, "items", [ 
["address","city"], ["address","company_name"], ["address","country"], ["address","first_name"], ["address","last_name"], ["address","personal"], ["address","salutation"], ["address","street"], ["address","suffix_1"], ["address","suffix_2"], ["address","title"], ["address","zip_code"], 
"amount", "amount_net", "attachment_ids", "bank_debit_form", "cancel_id", "cash_allowance", "cash_allowance_days", "cash_allowance_text", "contact_id", "contact_label", "contact_text", "created_at", "currency", "customer_id", "discount", "discount_type", "document_date", "due_date", "edited_at", "external_id", "grace_period", "id", "is_archive", "is_draft", "is_replica", 
["items","booking_account"], ["items","cost_price_charge"], ["items","cost_price_charge_type"], ["items","cost_price_net"], ["items","cost_price_total"], ["items","description"], ["items","discount"], ["items","discount_type"], ["items","export_cost_1"], ["items","export_cost_2"], ["items","id"], ["items","numbr"], ["items","position"], ["items","position_id"], ["items","quantity"], ["items","quantity_str"], ["items","serial_number"], ["items","serial_number_id"], ["items","single_price_gross"], ["items","single_price_net"], ["items","total_price_gross"], ["items","total_price_net"], ["items","total_vat"], ["items","type"], ["items","unit"], ["items","vat_percent"], 
"label_address", "label_address", "login_id", "numbr", "paid_amount", "paid_at", "pdf_pages", "pdf_template", "project_id", "ref_id", "replica_url", 
["service_date","type"], ["service_date","date"], ["service_date","date_from"], ["service_date","date_to"], ["service_date","text"], 
"status", "text", "text_prefix", "title", "type", "use_shipping_address", "vat_option" 
]) 

# save to csv 
norm_data.to_csv(csv_file_path, sep=";")

来源

2017-10-15 thowi

我看到sevaral问题与您的代码：

您有冲突的ID的元数据。例如，你有'id'作为元数据（级别1项），也'id'为您'items'的元素。这可以通过给第三个参数被解析为json_normalize，像

json_normalize（good_data， “项目”，[...]， “元”。
json_normalize预期的元数据存储在字典（可能是，字典，递归），但你有list类型的值项，例如attachment_ids，似乎目前json_normalize无法处理它们。
而且，似乎json_normalize不能处理空类型的字典，像"label_address": {}。
最后，你可能不需要线["items","booking_account"], ["items","cost_price_charge"], ...在json_normalize你的第三个（元数据）的说法，与这样的路径元素已经为您的数据检索（即由于第二个参数json_normalize）。

考虑到与json_normalize的问题，我不希望使用你的问题，但只是写下简单的命令式代码（带循环/列表理解），从创建一个表（名单列表）你的JSON，然后从该表创建pandas数据帧。

来源

2017-10-15 13:27:12

嗨 - 感谢您发表评论！我用'只有json_normalize'，因为我需要一个平坦的桌面标准化，结果像如下所示：http://pandas.pydata.org/pandas-docs/stable/io.html#normalization “里克斯科特”是决赛桌多次 - 但在源json中只有一次。你有任何提示如何达到？我完全是新的python - 所以它是相当喧嚣通过:( – thowi

JSON与正常化大熊猫 - 列表索引必须是int

回答

相关问题