2017-08-09 78 views
1

我有一个GROUPBY创建一个数据框:Python的大熊猫 - 转换列百分比上GROUPBY DF

hmdf = pd.DataFrame(hm01) 
new_hm01 = hmdf[['FinancialYear','Month','FirstReceivedDate']] 

hm05 = new_hm01.pivot_table(index=['FinancialYear','Month'], aggfunc='count') 
vals1 = ['April ', 'May  ', 'June  ', 'July  ', 'August ', 'September', 'October ', 'November ', 'December ', 'January ', 'February ', 'March '] 

df_hm = new_hm01.groupby(['Month', 'FinancialYear']).size().unstack(fill_value=0).rename(columns=lambda x: '{}'.format(x)) 
df_hml = df_hm.reindex(vals1) 

的DF如下:

FinancialYear 2014/2015 2015/2016 2016/2017 2017/2018 
Month    
April    34   24   22   20 
May     29   26   21   25 
June    19   39   22   20 
July    23   39   18   20 
August    36   30   34   0 
September   35   23   41   0 
October    36   37   27   0 
November   38   31   30   0 
December   36   41   23   0 
January    34   30   35   0 
February   37   26   37   0 
March    36   31   33   0 

列名和变量(threeYr,twoYr,oneYr,Yr),我想转换数据框,以便数字是每个列的总数的百分比,但我无法让它工作。

这就是我想要的:

FinancialYear  2014/2015 2015/2016 2016/2017 2017/2018 
Month    
April     9%   6%   6%   24% 
May      7%   7%   6%   29% 
June     5%   10%   6%   24% 
July     6%   10%   5%   24% 
August     9%   8%   10%   0% 
September    9%   6%   12%   0% 
October     9%   10%   8%   0% 
November    10%   8%   9%   0% 
December    9%   11%   7%   0% 
January     9%   8%   10%   0% 
February    9%   7%   11%   0% 
March     9%   8%   10%   0% 

谁能帮我做这个?

编辑:我试过在这个链接发现的响应:pandas convert columns to percentages of the totals .....我无法让我的数据框工作+它不能很好地解释(对我)如何使它适用于任何DF。我相信John Galt的回应比这个回应更好(我的观点)。

+1

的[熊猫转换列的百分比可能的复制总计](https://stackoverflow.com/questions/42006346/pandas-convert-columns-to-percentages-of-the-totals) –

+0

接下来,我无法让它为我的数据框工作。因此我的问题。谢谢你,虽然 – ScoutEU

回答

5

这里有一种方法

In [1371]: (100. * df/df.sum()).round(0) 
Out[1371]: 
       2014/2015 2015/2016 2016/2017 2017/2018 
FinancialYear 
April    9.0  6.0  6.0  24.0 
May     7.0  7.0  6.0  29.0 
June     5.0  10.0  6.0  24.0 
July     6.0  10.0  5.0  24.0 
August    9.0  8.0  10.0  0.0 
September   9.0  6.0  12.0  0.0 
October    9.0  10.0  8.0  0.0 
November   10.0  8.0  9.0  0.0 
December    9.0  11.0  7.0  0.0 
January    9.0  8.0  10.0  0.0 
February    9.0  7.0  11.0  0.0 
March    9.0  8.0  10.0  0.0 

而且,如果你想四舍五入到1位小数有值与“%”的字符串

In [1375]: (100. * df/df.sum()).round(1).astype(str) + '%' 
Out[1375]: 
       2014/2015 2015/2016 2016/2017 2017/2018 
FinancialYear 
April    8.7%  6.4%  6.4%  23.5% 
May    7.4%  6.9%  6.1%  29.4% 
June    4.8%  10.3%  6.4%  23.5% 
July    5.9%  10.3%  5.2%  23.5% 
August    9.2%  8.0%  9.9%  0.0% 
September   8.9%  6.1%  12.0%  0.0% 
October   9.2%  9.8%  7.9%  0.0% 
November   9.7%  8.2%  8.7%  0.0% 
December   9.2%  10.9%  6.7%  0.0% 
January   8.7%  8.0%  10.2%  0.0% 
February   9.4%  6.9%  10.8%  0.0% 
March    9.2%  8.2%  9.6%  0.0% 
+0

约翰,这工作非常好!非常感谢你:)...非常感谢不同的解决方案 – ScoutEU