熊猫：数据帧到不平衡面板

dictDF = {0：df0,1：df1,2：DF2}

每个数据帧DF0，DF1，DF2表示时间，其中第一列标识（如社会安全号码）一人，另一列是这个人的特点，如

DataFrame df0 

id Name  Age Gender Job  Income 
10 Daniel 40 Male Scientist 100 
5 Anna  39 Female Doctor  250 

DataFrame df1 

id Name  Age Gender Job  Income 
67 Guto  35 Male Engineer 100 
7 Anna  39 Female Doctor  300 
9 Melissa 26 Female Student 36 

DataFrame df2 

id Name  Age Gender Job  Income 
77 Patricia 30 Female Dentist 300 
9 Melissa  27 Female Dentist 250

为特定日期的表的ID（社会安全号码）确切地标识该人。例如，相同的“Melissa”出现在两个不同的DataFrame中。但是，有两种不同的“Annas”。

在这些数据框中，人员和人员的数量随时间而变化。有些人在所有日期都有代表，其他人只在特定日期代表。

有没有一种简单的方法来转换（非平衡）Panel对象中的数据框字典，其中id在所有日期出现，并且如果数据给定的id不可用，它将被NaN替换？

当然，我可以这样做，制作一个所有id的列表，然后检查每个日期是否有给定的id。如果它被表示，那么我复制数据。否则，我只写NaN。

我不知道是否有一个简单的方法使用熊猫工具。

来源

2016-02-12 DanielTheRocketMan

我会推荐使用MultiIndex而不是面板。

首先，期间添加到每个数据帧：

for n, df in dictDF.iteritems(): 
    df['period'] = n

然后连接成一个大的数据帧：

big_df = pd.concat([df for df in dictDF.itervalues()], ignore_index=True)

现在你们指数period和id，你都保证有一个独特的index：

>>> big_df.set_index(['period', 'id']) 
       Name Age Gender  Job Income 
period id           
0  10 Daniel 40 Male Scientist  100 
     5  Anna 39 Female  Doctor  250 
1  67  Guto 35 Male Engineer  100 
     7  Anna 39 Female  Doctor  300 
     9 Melissa 26 Female Student  36 
2  77 Patricia 30 Female Dentist  300 
     9 Melissa 27 Female Dentist  250

你也可以反向顺序：

>>> big_df.set_index(['id', 'period']).sort_index() 
       Name Age Gender  Job Income 
id period           
5 0   Anna 39 Female  Doctor  250 
7 1   Anna 39 Female  Doctor  300 
9 1  Melissa 26 Female Student  36 
    2  Melissa 27 Female Dentist  250 
10 0   Daniel 40 Male Scientist  100 
67 1   Guto 35 Male Engineer  100 
77 2  Patricia 30 Female Dentist  300

你甚至可以拆散的数据很容易：

big_df.set_index(['id', 'period'])[['Income']].unstack('period') 
     Income   
period  0 1 2 
id      
5   250 NaN NaN 
7   NaN 300 NaN 
9   NaN 36 250 
10   100 NaN NaN 
67   NaN 100 NaN 
77   NaN NaN 300

来源

2016-02-12 02:57:22 Alexander

熊猫：数据帧到不平衡面板

回答

相关问题