使用Pandas读取子级JSON数据

使用Pandas读取子水平数据时，我卡住了。使用Pandas读取子级JSON数据

背景：

我用NYT存档API下载一系列数据，我保存它实际上有它JSON对象列表的JSON文件。

步骤：

我使用read_json方法读取的JSON文件。

pandas_df = pd.read_json("data.json")

当我用头看样的结果，它看起来像如下：

pandas_df.head() 
    copyright \ 
0 Copyright (c) 2013 The New York Times Company.... 
1 Copyright (c) 2013 The New York Times Company.... 
2 Copyright (c) 2013 The New York Times Company.... 
3 Copyright (c) 2013 The New York Times Company.... 
4 Copyright (c) 2013 The New York Times Company.... 

              response 
0 {'docs': [{'subsection_name': None, 'slideshow... 
1 {'docs': [{'subsection_name': None, 'slideshow... 
2 {'docs': [{'subsection_name': None, 'slideshow... 
3 {'docs': [{'subsection_name': None, 'slideshow... 
4 {'docs': [{'subsection_name': None, 'slideshow...

我只需要在响应信息。所以，当我改变像下面的代码：

print(pandas_df["response"].head()) 
0 {'docs': [{'subsection_name': None, 'slideshow... 
1 {'docs': [{'subsection_name': None, 'slideshow... 
2 {'docs': [{'subsection_name': None, 'slideshow... 
3 {'docs': [{'subsection_name': None, 'slideshow... 
4 {'docs': [{'subsection_name': None, 'slideshow... 
Name: response, dtype: object

问：

我如何可以获取使用内部文档元素的数据？像小节，幻灯片等我可以看到它在表格格式，如数据框？

如果需要更多信息，请让我知道。

谢谢。

EDIT 1：

从JSON文件添加第一个元素。这个文件在1GB左右太大了。

{ 
    "copyright": "Copyright (c) 2013 The New York Times Company. All Rights Reserved.", 
    "response": { 
    "meta": { 
     "hits": 7652 
    }, 
    "docs": [ 
     { 
     "web_url": "http://www.nytimes.com/interactive/2016/technology/personaltech/cord-cutting-guide.html", 
     "snippet": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.", 
     "lead_paragraph": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.", 
     "abstract": null, 
     "print_page": null, 
     "blog": [], 
     "source": "The New York Times", 
     "multimedia": [ 
      { 
      "width": 190, 
      "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg", 
      "height": 126, 
      "subtype": "wide", 
      "legacy": { 
       "wide": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg", 
       "wideheight": "126", 
       "widewidth": "190" 
      }, 
      "type": "image" 
      }, 
      { 
      "width": 600, 
      "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg", 
      "height": 346, 
      "subtype": "xlarge", 
      "legacy": { 
       "xlargewidth": "600", 
       "xlarge": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg", 
       "xlargeheight": "346" 
      }, 
      "type": "image" 
      }, 
      { 
      "width": 75, 
      "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg", 
      "height": 75, 
      "subtype": "thumbnail", 
      "legacy": { 
       "thumbnailheight": "75", 
       "thumbnail": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg", 
       "thumbnailwidth": "75" 
      }, 
      "type": "image" 
      } 
     ], 
     "headline": { 
      "main": "The Definitive Guide to Cord-Cutting in 2016, Based on Your Habits", 
      "kicker": "Tech Fix" 
     }, 
     "keywords": [ 
      { 
      "rank": "1", 
      "is_major": "N", 
      "name": "subject", 
      "value": "Video Recordings, Downloads and Streaming" 
      }, 
      { 
      "rank": "2", 
      "is_major": "N", 
      "name": "subject", 
      "value": "Television Sets and Media Devices" 
      }, 
      { 
      "rank": "1", 
      "is_major": "Y", 
      "name": "subject", 
      "value": "Television" 
      } 
     ], 
     "pub_date": "2016-01-01T05:00:00Z", 
     "document_type": "multimedia", 
     "news_desk": "Technology/Personal Tech", 
     "section_name": "Technology", 
     "subsection_name": "Personal Tech", 
     "byline": { 
      "person": [ 
      { 
       "firstname": "Brian", 
       "middlename": "X.", 
       "lastname": "CHEN", 
       "rank": 1, 
       "role": "reported", 
       "organization": "" 
      } 
      ], 
      "original": "By BRIAN X. CHEN" 
     }, 
     "type_of_material": "Interactive Feature", 
     "_id": "57fdfb9895d0e022439c2b57", 
     "word_count": null, 
     "slideshow_credits": null 
     }]}}

来源

2017-03-31 disp_name

您可以发布前几行的整个原始JSON吗？ –

补充，请看看。 –

我想读“文档” –

你应该能够提取所有在其下嵌套在response字典成数据帧的docs列表中的元素。

import json 
with open('data.json') as f: 
    data = json.load(f) 
df = pd.DataFrame(data['response']['docs'])

来源

2017-03-31 14:50:34

最后一行是给我的错误中大多值：类型错误：列表索引必须是整数或片，而不是STR 你知道为什么是这样呢？这是因为我正在读取一个包含多个JSON对象的文件吗？ –

我通过添加一个闭括号和两个闭合的大括号来修改了json输入。将确切的json直接复制到文件中，然后再次运行我的代码。它应该工作。 –

使用Pandas读取子级JSON数据

回答

相关问题