2016-09-20 86 views
2

我的数据是这样的:转换一个字典的熊猫数据帧

{u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'} 

我想将其转换为大熊猫数据帧。但是,当我尝试

df = pd.DataFrame(response.items()) 

我与两列的数据帧时,先用第一个关键,第二个与键的值:

      0      1 
0 "57e01311817bc367c030b390" {"ad_since": 2016, "indoor_swimming_pool": "No... 
1 "57e01311817bc367c030b3a8" {"ad_since": 2012, "indoor_swimming_pool": "No... 

我怎样才能得到一个列对于每个键:"ad_since","indoor_swimming_pool","indoor_swimming_pool"?并保留第一列,或者将id作为索引。

+0

尝试read_json http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.read_json.html –

+0

您是否尝试使用'pd.DataFrame(response.items())' ?对我来说,它不工作。 – jezrael

+0

@jezrael感谢您的评论,我编辑我的帖子 – mitsi

回答

1

您需要通过.apply(literal_eval).apply(json.loads)转换的typestrdict然后用DataFrame.from_records

import pandas as pd 
from ast import literal_eval 

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
      u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'} 

df = pd.DataFrame.from_dict(response, orient='index') 

print (type(df.iloc[0,0])) 
<class 'str'> 

df.iloc[:,0] = df.iloc[:,0].apply(literal_eval) 

print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index)) 
          ad_since handicapped_access indoor_swimming_pool \ 
"57e01311817bc367c030b3a8"  2012    Yes     No 
"57e01311817bc367c030b390"  2016    Yes     No 

          seaside 
"57e01311817bc367c030b3a8"  No 
"57e01311817bc367c030b390"  No 

import pandas as pd 
import json 

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
      u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'} 


df = pd.DataFrame.from_dict(response, orient='index') 
df.iloc[:,0] = df.iloc[:,0].apply(json.loads) 


print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index)) 
          ad_since handicapped_access indoor_swimming_pool \ 
"57e01311817bc367c030b3a8"  2012    Yes     No 
"57e01311817bc367c030b390"  2016    Yes     No 

          seaside 
"57e01311817bc367c030b3a8"  No 
"57e01311817bc367c030b390"  No 
+0

第一种方法(使用'literal_eval')和整个数据集,我得到错误'ValueError:格式不正确的字符串'它可能是由于特殊字符。但它与'json.loads'的第二种方法完美结合,谢谢 – mitsi

+0

很高兴能为您提供帮助。 – jezrael

1

由于值是字符串,您可以使用json module和列表理解:

In [20]: d =  {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'} 

In [21]: import json 

In [22]: pd.DataFrame(dict([(k, [json.loads(e)[k] for e in d.values()]) for k in json.loads(d.values()[0])]), index=d.keys())Out[22]: 
          ad_since handicapped_access indoor_swimming_pool \ 
"57e01311817bc367c030b390"  2016    Yes     No 
"57e01311817bc367c030b3a8"  2012    Yes     No 

         seaside 
"57e01311817bc367c030b390"  No 
"57e01311817bc367c030b3a8"  No