2017-10-15 177 views
0

我想解析一个熊猫列表嵌套列表的数据框。列表熊猫数据框列表

这是列表的样本:

>>>result[1] 
{ 
    "account_currency": "BRL", 
    "account_id": "1600343406676896", 
    "account_name": "aaa", 
    "buying_type": "AUCTION", 
    "campaign_id": "aaa", 
    "campaign_name": "aaaL", 
    "canvas_avg_view_percent": "0", 
    "canvas_avg_view_time": "0", 
    "clicks": "1", 
    "cost_per_total_action": "8.15", 
    "cpm": "60.820896", 
    "cpp": "61.278195", 
    "date_start": "2017-10-08", 
    "date_stop": "2017-10-15", 
    "device_platform": "desktop", 
    "frequency": "1.007519", 
    "impression_device": "desktop", 
    "impressions": "134", 
    "inline_link_clicks": "1", 
    "inline_post_engagement": "1", 
    "objective": "CONVERSIONS", 
    "outbound_clicks": [ 
     { 
      "action_type": "outbound_click", 
      "value": "1" 
     } 
    ], 
    "platform_position": "feed", 
    "publisher_platform": "facebook", 
    "reach": "133", 
    "social_clicks": "1", 
    "social_impressions": "91", 
    "social_reach": "90", 
    "spend": "8.15", 
    "total_action_value": "0", 
    "total_actions": "1", 
    "total_unique_actions": "1", 
    "unique_actions": [ 
     { 
      "action_type": "landing_page_view", 
      "value": "1" 
     }, 
     { 
      "action_type": "link_click", 
      "value": "1" 
     }, 
     { 
      "action_type": "page_engagement", 
      "value": "1" 
     }, 
     { 
      "action_type": "post_engagement", 
      "value": "1" 
     } 
    ], 
    "unique_clicks": "1", 
    "unique_inline_link_clicks": "1", 
    "unique_outbound_clicks": [ 
     { 
      "action_type": "outbound_click", 
      "value": "1" 
     } 
    ], 
    "unique_social_clicks": "1" 
} 

当我将其转换成数据帧熊猫,我得到:

>>>df = pd.DataFrame(result) 
>>>df 
.... 

unique_actions \ 
NaN 
[{u'value': u'1', u'action_type': u'landing_pa... 
NaN 
[{u'value': u'2', u'action_type': u'landing_pa... 
[{u'value': u'4', u'action_type': u'landing_pa... 
NaN 

独特的动作和一些其它过滤器不归。

我该如何规范化它到相同的粒度?

+0

“归一化到相同粒度”是什么意思?你究竟希望你的结果看起来像什么? –

+0

你的结构实际上是一个json文件。 – Parfait

+0

@Parfait明白了。我怎样才能在转置的列中打开它? –

回答

1

您可以使用json_normalize,像这样:

pd.io.json.json_normalize(df.unique_actions) 
+0

我得到这个错误:AttributeError:'浮动'对象没有属性'itervalues' –

1

考虑json_normalize在嵌套列表传递作为record_path和所有其他指标。但是,因为您有多个嵌套列表,json将传输三个数据帧的信息:

from pandas.io.json import json_normalize 


merge_fields = ['account_currency', 'account_id', 'account_name', 'buying_type', 'campaign_id', 
       'campaign_name', 'canvas_avg_view_percent', 'canvas_avg_view_time', 'clicks', 
       'cost_per_total_action', 'cpm', 'cpp', 'date_start', 'date_stop', 'device_platform', 
       'frequency', 'impression_device', 'impressions', 'inline_link_clicks', 'inline_post_engagement', 
       'objective', 'platform_position', 'publisher_platform', 'reach', 'social_clicks', 'social_impressions', 
       'social_reach', 'spend', 'total_action_value', 'total_actions', 'total_unique_actions', 
       'unique_clicks', 'unique_inline_link_clicks', 'unique_social_clicks'] 


unique_actions_df = json_normalize(result[1], record_path='unique_actions', meta=merge_fields) 

outbound_clicks_df = json_normalize(result[1], record_path='outbound_clicks', meta=merge_fields) 

unique_outbound_clicks_df = json_normalize(result[1], record_path='unique_outbound_clicks', meta=merge_fields) 
+0

我得到TypeError:字符串索引必须是整数 –

+0

你是否传递完全你是什么发布,结果[1]或其他项目?如果结构在列表项中相同,则需要遍历'result'。 – Parfait