2017-08-08 165 views
1

我很难尝试使用熊猫将如下所示的JSON字符串转换为CSV。使用熊猫将JSON转换为CSV

这里是我的榜样字符串(它也从一个文件中读取):

{ 
    "count": 8, 
    "facets": [], 
    "results": [ 
     { 
     "protocol": "DWC_ARCHIVE", 
     "taxonKey": 4332928, 
     "family": "Diaptomidae", 
     "institutionCode": "MNHN", 
     "lastInterpreted": "2017-05-17T13:20:23.744+0000", 
     "speciesKey": 4332928, 
     "gbifID": "694182141", 
     "identifiedBy": "Dussart B.", 
     "lastParsed": "2017-05-17T13:19:47.003+0000", 
     "phylum": "Arthropoda", 
     "orderKey": 679, 
     "facts": [], 
     "species": "Diaptomus kenitraensis", 
     "issues": [], 
     "occurrenceID": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2010-6707", 
     "countryCode": null, 
     "basisOfRecord": "PRESERVED_SPECIMEN", 
     "relations": [], 
     "classKey": 203, 
     "catalogNumber": "2010-6707", 
     "scientificName": "Diaptomus kenitraensis Kiefer, 1926", 
     "taxonRank": "SPECIES", 
     "familyKey": 9038, 
     "kingdom": "Animalia", 
     "publishingOrgKey": "2cd829bb-b713-433d-99cf-64bef11e5b3e", 
     "collectionCode": "IU", 
     "kingdomKey": 1, 
     "genusKey": 2114554, 
     "key": 694182141, 
     "phylumKey": 54, 
     "genericName": "Diaptomus", 
     "class": "Maxillopoda", 
     "crawlId": 116, 
     "individualCount": 1, 
     "publishingCountry": "FR", 
     "identifier": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2010-6707", 
     "lastCrawled": "2017-08-03T14:05:37.635+0000", 
     "license": "http://creativecommons.org/licenses/by/4.0/legalcode", 
     "datasetKey": "da6a07ed-9eee-460d-9448-910f542c1a7b", 
     "specificEpithet": "kenitraensis", 
     "identifiers": [], 
     "modified": "2015-06-19T19:23:01.000+0000", 
     "extensions": {}, 
     "genus": "Diaptomus", 
     "order": "Calanoida" 
     }, 
     { 
     "protocol": "DWC_ARCHIVE", 
     "taxonKey": 4332928, 
     "family": "Diaptomidae", 
     "institutionCode": "MNHN", 
     "lastInterpreted": "2017-05-17T13:19:51.210+0000", 
     "speciesKey": 4332928, 
     "gbifID": "440012453", 
     "identifiedBy": "Dussart B.", 
     "lastParsed": "2017-05-17T13:19:31.422+0000", 
     "phylum": "Arthropoda", 
     "orderKey": 679, 
     "facts": [], 
     "species": "Diaptomus kenitraensis", 
     "issues": [], 
     "occurrenceID": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2007-1537", 
     "countryCode": null, 
     "basisOfRecord": "PRESERVED_SPECIMEN", 
     "relations": [], 
     "classKey": 203, 
     "catalogNumber": "2007-1537", 
     "scientificName": "Diaptomus kenitraensis Kiefer, 1926", 
     "taxonRank": "SPECIES", 
     "familyKey": 9038, 
     "kingdom": "Animalia", 
     "publishingOrgKey": "2cd829bb-b713-433d-99cf-64bef11e5b3e", 
     "collectionCode": "IU", 
     "kingdomKey": 1, 
     "genusKey": 2114554, 
     "key": 440012453, 
     "phylumKey": 54, 
     "genericName": "Diaptomus", 
     "class": "Maxillopoda", 
     "crawlId": 116, 
     "individualCount": 8, 
     "publishingCountry": "FR", 
     "identifier": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2007-1537", 
     "lastCrawled": "2017-08-03T14:05:30.146+0000", 
     "license": "http://creativecommons.org/licenses/by/4.0/legalcode", 
     "datasetKey": "da6a07ed-9eee-460d-9448-910f542c1a7b", 
     "specificEpithet": "kenitraensis", 
     "identifiers": [], 
     "modified": "2015-06-19T19:23:00.000+0000", 
     "extensions": {}, 
     "genus": "Diaptomus", 
     "order": "Calanoida" 
     } 
    ], 
    "endOfRecords": false, 
    "limit": 2, 
    "offset": 0 
} 

什么是我感兴趣的是“结果”的一部分。

使用熊猫,我尝试这样做:

df = pd.read_json(json_string) 
df.to_csv("output.csv", index=False, sep='\t', encoding="utf-8") 

但我得到以下错误:

File "C:\Python27\lib\site-packages\pandas\io\json.py", line 281, in read_json 
    date_unit).parse() 
    File "C:\Python27\lib\site-packages\pandas\io\json.py", line 349, in parse 
    self._parse_no_numpy() 
    File "C:\Python27\lib\site-packages\pandas\io\json.py", line 566, in _parse_no_numpy 
    loads(json, precise_float=self.precise_float), dtype=None) 
TypeError: Expected String or Unicode 

我也试过大部分从这里更详细的建议:How can I convert JSON to CSV?,企图将上面的json直接转换成CSV(绕过熊猫),但没有成功。

任何人都可以给我一个提示吗?预先感谢您提供的任何帮助。

最好的问候,

+0

也许在JSON的列表和字典导致错误? :)如果他们总是空的,你可以考虑只是删除它们。 – Roelant

回答

2

您可以使用json_normalize

import json 
from pandas.io.json import json_normalize 

with open('file.json') as data_file:  
    data = json.load(data_file) 

df = json_normalize(data, 'results') 
print (df) 
     basisOfRecord catalogNumber  class classKey collectionCode \ 
0 PRESERVED_SPECIMEN  2010-6707 Maxillopoda  203    IU 
1 PRESERVED_SPECIMEN  2007-1537 Maxillopoda  203    IU 

    countryCode crawlId       datasetKey extensions facts \ 
0  None  116 da6a07ed-9eee-460d-9448-910f542c1a7b   {} [] 
1  None  116 da6a07ed-9eee-460d-9448-910f542c1a7b   {} [] 

    ...   protocol publishingCountry \ 
0 ...  DWC_ARCHIVE     FR 
1 ...  DWC_ARCHIVE     FR 

         publishingOrgKey relations \ 
0 2cd829bb-b713-433d-99cf-64bef11e5b3e  [] 
1 2cd829bb-b713-433d-99cf-64bef11e5b3e  [] 

         scientificName     species speciesKey \ 
0 Diaptomus kenitraensis Kiefer, 1926 Diaptomus kenitraensis 4332928 
1 Diaptomus kenitraensis Kiefer, 1926 Diaptomus kenitraensis 4332928 

    specificEpithet taxonKey taxonRank 
0 kenitraensis 4332928 SPECIES 
1 kenitraensis 4332928 SPECIES 

[2 rows x 45 columns] 
+1

是的,谢谢它工作得很好!实际上,我曾考虑尝试json_normalize,但是却得到了错误的语法。 – maurobio