2016-11-20 89 views
2

我有,我已经使用了groupby方法随后在describe方法给我下面的一个多索引的熊猫数据帧:从另一个多索引大熊猫多索引大熊猫数据帧添加额外的条目数据帧

grouped= self.HK_data.groupby(level=[0,1]) 
    summary= grouped.describe() 

这给:

SEM=grouped.mean()/(numpy.sqrt(grouped.count())) 

Antibody  Time     
Customer_Col1A2 0 count 3.000000 
        mean 0.757589 
        std 0.188750 
        min 0.639933 
        25% 0.648732 
        50% 0.657532 
        75% 0.816417 
        max 0.975302 
       10 count 3.000000 
        mean 0.716279 
        std 0.061939 
        min 0.665601 
        25% 0.681757 
        50% 0.697913 
        75% 0.741618 
        max 0.785324 
        ... ......... 

我一直在使用计算

,并提供:

Antibody     Time   
Customer_Col1A2   0  0.437394 
         10 0.413544 
         120 0.553361 
         180 0.502792 
         20 0.512797 
         240 0.514609 
         30 0.505618 
         300 0.481021 
         45 0.534658 
         5  0.425800 
         60 0.430633 
         90 0.525115 
         ... ......... 

如何concat这两个帧使得SEM的成为了汇总统计的另一条目?

因此,像:

Antibody  Time     
Customer_Col1A2 0 count 3.000000 
        mean 0.757589 
        std 0.188750 
        min 0.639933 
        25% 0.648732 
        50% 0.657532 
        75% 0.816417 
        max 0.975302 
        SEM 0.437394 
       10 count 3.000000 
        mean 0.716279 
        std 0.061939 
        min 0.665601 
        25% 0.681757 
        50% 0.697913 
        75% 0.741618 
        max 0.785324 
        SEM 0.413544 

我试过pandas.concat但是这并没有给我我想要的。

谢谢!

回答

2

我想你先加MultiIndex第三层次,与sort_index分配由MultiIndex.from_tuples新的索引和最后使用concat

HK_data = pd.DataFrame({'Antibody':['Customer_Col1A2','Customer_Col1A2','Customer_Col1A2'], 
        'Time':[0,10,10], 
        'Col':[7,8,9]}) 
HK_data = HK_data.set_index(['Antibody','Time']) 
print (HK_data) 
         Col 
Antibody  Time  
Customer_Col1A2 0  7 
       10  8 
       10  9 
grouped= HK_data.groupby(level=[0,1]) 
summary= grouped.describe() 
print (summary) 
           Col 
Antibody  Time     
Customer_Col1A2 0 count 1.000000 
        mean 7.000000 
        std   NaN 
        min 7.000000 
        25% 7.000000 
        50% 7.000000 
        75% 7.000000 
        max 7.000000 
       10 count 2.000000 
        mean 8.500000 
        std 0.707107 
        min 8.000000 
        25% 8.250000 
        50% 8.500000 
        75% 8.750000 
        max 9.000000 

SEM=grouped.mean()/(np.sqrt(grouped.count())) 
#change multiindex 
new_index = list(zip(SEM.index.get_level_values('Antibody'), 
        SEM.index.get_level_values('Time'), 
        ['SEM'] * len(SEM.index))) 
SEM.index = pd.MultiIndex.from_tuples(new_index, names=('Antibody','Time', None)) 

print (SEM) 
           Col 
Antibody  Time    
Customer_Col1A2 0 SEM 7.000000 
       10 SEM 6.010408 
df = pd.concat([summary, SEM]).sort_index() 
print (df) 
           Col 
Antibody  Time     
Customer_Col1A2 0 25% 7.000000 
        50% 7.000000 
        75% 7.000000 
        SEM 7.000000 
        count 1.000000 
        max 7.000000 
        mean 7.000000 
        min 7.000000 
        std   NaN 
       10 25% 8.250000 
        50% 8.500000 
        75% 8.750000 
        SEM 6.010408 
        count 2.000000 
        max 9.000000 
        mean 8.500000 
        min 8.000000 
        std 0.707107 
+0

这工作很漂亮,谢谢! – CiaranWelsh