2016-12-05 25 views
-2

我有一个数据帧是从groupby调用结果获取从Python的数据帧基于指数

test=uniqueStudents.groupby(['index1','index2']).count() 

test.head(10) 

我期待在那里我发现整个索引1

计数输出的平均获得一个总平均值

结果和期望的输出示于下

电流/所需的输出继电器:

Current/Desired Ouput

有人可以帮我用python代码来实现这个吗?或者还有其他方法可以从数据集中获取吗?

回答

1

groupby方法中使用level参数,该方法可以采用索引的名称。

test.groupby(level='index1').mean() 

此外,您可以重置指数和做的by参数正常GROUPBY。

test.reset_index().groupby('index1').mean() 
0

您需要通过index1水平groupby和总GroupBy.mean,然后按列得到DataFrame.mean

test = pd.DataFrame({'column4': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column10': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}, 'column3': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column8': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}, 'column11': {('01-06-15', 278658): 22.0, ('01-06-15', 206905): 101.0, ('02-06-15', 225800): 308.0, ('02-06-15', 225596): 19.0, ('01-06-15', 152551): 64.0, ('01-06-15', 124337): 54.0, ('02-06-15', 235369): 7.0, ('01-06-15', 31883): 124.0, ('03-06-15', 124337): np.nan}, 'column5': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column7': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 3, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column2': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column1': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column6': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column9': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}}) 
test.index.names = ['index1','index2'] 
test = test[['column'+str(col) for col in range(1,12)]] 
print (test) 
       column1 column2 column3 column4 column5 column6 \ 
index1 index2               
01-06-15 31883  124  124  124  124  124  124 
     124337  54  54  54  54  54  54 
     152551  64  64  64  64  64  64 
     206905  101  101  101  101  101  101 
     278658  22  22  22  22  22  22 
02-06-15 225596  19  19  19  19  19  19 
     225800  308  308  308  308  308  308 
     235369  7  7  7  7  7  7 
03-06-15 124337  17  17  17  17  17  17 

       column7 column8 column9 column10 column11 
index1 index2             
01-06-15 31883  124  62.0  62.0  62.0  124.0 
     124337  54  21.0  21.0  21.0  54.0 
     152551  64  55.0  55.0  55.0  64.0 
     206905  101  60.0  60.0  60.0  101.0 
     278658  22  17.0  17.0  17.0  22.0 
02-06-15 225596  19  15.0  15.0  15.0  19.0 
     225800  308 280.0 280.0  280.0  308.0 
     235369  3  3.0  3.0  3.0  7.0 
03-06-15 124337  17  NaN  NaN  NaN  NaN 
df = test.groupby(level='index1').mean().mean(axis=1).reset_index(name='val') 
print (df) 
    index1   val 
0 01-06-15 57.818182 
1 02-06-15 107.939394 
2 03-06-15 17.000000 

另一种解决方案是第一mean按列,然后groupby

df = test.mean(axis=1).groupby(level='index1').mean().reset_index(name='val') 
print (df) 
    index1   val 
0 01-06-15 57.818182 
1 02-06-15 107.939394 
2 03-06-15 17.000000