2017-02-13 107 views
2

Below is the snapshot of data-set. Now i want to write function to compute column wise maxima, minima,mean etc for each parameter 1,2,3.... wrt to each category. 如何遍历和在大熊猫数据帧

进行了列操作可有人请建议的方法/代码来对这个数据集工作。需要为每个类别编写用户定义的函数来为每个参数列计算最大值,最小值等。

下面是代码剪断我tried-

def stats(parameter): 
    print("######################") 
    print(parameter) 
    max = parameter.max() 
    mean = parameter.mean() 
    min = parameter.min() 
    print("stats function executed") 
for column in df1.ix[:,2:]: 
    print(column) 
    stats(column) 

回答

1

你需要df1[]的选择列:

for column in df1.ix[:,2:]: 
    print(column) 
    stats(df1[column]) 

但更好的是使用filterdescribe

df1 = pd.DataFrame({'Date':['10-01-2017','10-01-2017','11-01-2017'], 
        'Categories':['Ca1','Cat1','Cat2'], 
        'Parameter1':[7,8,9], 
        'Parameter2':[1,3,5], 
        'Parameter3':[5,3,6], 
        'Parameter3':[7,4,3]}) 

print (df1) 
    Categories  Date Parameter1 Parameter2 Parameter3 
0  Ca1 10-01-2017   7   1   7 
1  Cat1 10-01-2017   8   3   4 
2  Cat2 11-01-2017   9   5   3 

df = df1.filter(like='Parameter').describe() 
print (df) 
     Parameter1 Parameter2 Parameter3 
count   3.0   3.0 3.000000 
mean   8.0   3.0 4.666667 
std   1.0   2.0 2.081666 
min   7.0   1.0 3.000000 
25%   7.5   2.0 3.500000 
50%   8.0   3.0 4.000000 
75%   8.5   4.0 5.500000 
max   9.0   5.0 7.000000 

最后是可能的过滤器输出:

L = ['mean','max','min'] 
print (df.loc[L]) 
     Parameter1 Parameter2 Parameter3 
mean   8.0   3.0 4.666667 
max   9.0   5.0 7.000000 
min   7.0   1.0 3.000000 
2

使用groupby和内置describe功能,你可以得到:

In [7]: df = pd.DataFrame({'Categories': ['a', 'a', 'b', 'b'], 'Param1': [42, 10, 123.23, 0.1], 'Param2': 
    ...: [13, 16, 12.23, -2]}) 

In [8]: df 
Out[8]: 
    Categories Param1 Param2 
0   a 42.00 13.00 
1   a 10.00 16.00 
2   b 123.23 12.23 
3   b 0.10 -2.00 

In [9]: df.groupby('Categories').describe() 
Out[9]: 
         Param1  Param2 
Categories        
a   count 2.000000 2.000000 
      mean 26.000000 14.500000 
      std  22.627417 2.121320 
      min  10.000000 13.000000 
      25%  18.000000 13.750000 
      50%  26.000000 14.500000 
      75%  34.000000 15.250000 
      max  42.000000 16.000000 
b   count 2.000000 2.000000 
      mean 61.665000 5.115000 
      std  87.066058 10.062129 
      min  0.100000 -2.000000 
      25%  30.882500 1.557500 
      50%  61.665000 5.115000 
      75%  92.447500 8.672500 
      max 123.230000 12.230000 

如果你拆散这个您可以:

In [10]: df.groupby('Categories').describe().unstack() 
Out[10]: 
      Param1                \ 
      count mean  std min  25%  50%  75%  max 
Categories                  
a    2.0 26.000 22.627417 10.0 18.0000 26.000 34.0000 42.00 
b    2.0 61.665 87.066058 0.1 30.8825 61.665 92.4475 123.23 

      Param2                
      count mean  std min  25%  50%  75% max 
Categories                 
a    2.0 14.500 2.121320 13.0 13.7500 14.500 15.2500 16.00 
b    2.0 5.115 10.062129 -2.0 1.5575 5.115 8.6725 12.23 
+0

感谢@languitar为描述性的答案和增强熊猫功能的知识。 – dany99