2017-10-11 110 views
0

这是我一直试图通过这几天的项目。我们正在寻找更好的方法来将财务数据整合到我们的仪表板中,但是我们使用的软件以令人厌恶的方式输出我们的数据,这种方式无法插入任何类型的程序,因为它可以让人们直观地浏览并获得想法。熊猫 - 按列分列我的数据框成行

我希望得到关于如何正确编写代码的建议,但是如果我在解决它的方法上疯了。这个数据已经被大量清除,所以请让我知道如果有什么可怕的错误:

    Expense Categories Jan Actual Jan Budget Feb Actual \ 
3 5600 Direct Personnel Expenses 2521.73   0 -290.57 
4   6000 Automobile Expense  909.33  1314  483.15 
5   6160 Funeral Home Expense  1072 1800.02   0 
6     6400 Lab Expense   0   0  65.18 
9  6100 Marketing & Promotion  543.13 1850.01 1158.41 

同时,在清洗我把变量,如:

department = "PR" 
direct_indirect = {'5600 Direct Personnel Expenses' : 'Direct Expense', etc} 

我的最终目标是包括在仪表盘预算概要我设计的通过画面每一个部门,所以我相信最好的结局会是这样的:

Expense Category Direct/Indirect Department Month-Year Actual Budget 
6400 Lab Expense Direct Expense PR   jan 2016  0  0 
6400 Lab Expense Direct Expense PR   feb 2016  0  0 
6400 Lab Expense Direct Expense PR   mar 2016  0  0 
6400 Lab Expense Direct Expense PR   apr 2016  0  0 
6400 Lab Expense Direct Expense PR   may 2016  0  0 

当我奋力如何完成这个我我完全不确定如何通过在每个费用类型的新数据框中创建多行来实现,并且每两列都是一组新的数字。我觉得唯一的办法是使用:

for index, row in df1.iterrows(): 

但我会如何遍历每个列,然后分配给那些新的数据帧丢失。

请让我知道,如果我想念你需要的任何细节,我感谢你的帮助。

安迪

+0

我想你想看看multiindexing。你可以通过聪明的枢轴等获得很多你想要的东西。 – Keith

回答

1

您可以通过使用df.columns.str.splitstack重塑你的数据框:

import sys 
import pandas as pd 

df = pd.DataFrame({'Expense Categories': ['5600 Direct Personnel Expenses', '6000 Automobile Expense', '6160 Funeral Home Expense', '6400 Lab Expense', '6100 Marketing & Promotion'], 'Feb Actual': [-290.57, 483.15, 0.0, 65.18, 1158.41], 'Jan Actual': [2521.73, 909.33, 1072.0, 0.0, 543.13], 'Jan Budget': [0.0, 1314.0, 1800.02, 0.0, 1850.01]}) 

df = df.set_index('Expense Categories') 
df.columns = df.columns.str.split(expand=True) 
df.columns.names = ['Month-Year',None] 
df = df.stack('Month-Year') 
df = df.reset_index() 
df['Direct/Indirect'] = 'Direct Expense' 
df['Department'] = 'PR' 
df['Month-Year'] = df['Month-Year'] + ' 2016' 

with pd.option_context('display.width', sys.maxsize): 
    print(df) 

产量

   Expense Categories Month-Year Actual Budget Direct/Indirect Department 
0 5600 Direct Personnel Expenses Feb 2016 -290.57  NaN Direct Expense   PR 
1 5600 Direct Personnel Expenses Jan 2016 2521.73  0.00 Direct Expense   PR 
2   6000 Automobile Expense Feb 2016 483.15  NaN Direct Expense   PR 
3   6000 Automobile Expense Jan 2016 909.33 1314.00 Direct Expense   PR 
4  6160 Funeral Home Expense Feb 2016  0.00  NaN Direct Expense   PR 
5  6160 Funeral Home Expense Jan 2016 1072.00 1800.02 Direct Expense   PR 
6    6400 Lab Expense Feb 2016 65.18  NaN Direct Expense   PR 
7    6400 Lab Expense Jan 2016  0.00  0.00 Direct Expense   PR 
8  6100 Marketing & Promotion Feb 2016 1158.41  NaN Direct Expense   PR 
9  6100 Marketing & Promotion Jan 2016 543.13 1850.01 Direct Expense   PR 

说明

df = df.set_index('Expense Categories') 
df.columns = df.columns.str.split(expand=True) 
df.columns.names = ['Month-Year',None] 

这些行为列索引创建一个MultiIndex。它从列标签的Acrtual/Budget部分拆分Month。​​用于此处隐藏操作中的Expense Categories列。此时df看起来是这样的:

Month-Year       Feb  Jan   
           Actual Actual Budget 
Expense Categories          
5600 Direct Personnel Expenses -290.57 2521.73  0.00 
6000 Automobile Expense   483.15 909.33 1314.00 
6160 Funeral Home Expense   0.00 1072.00 1800.02 
6400 Lab Expense     65.18  0.00  0.00 
6100 Marketing & Promotion  1158.41 543.13 1850.01 

现在,我们可以移动Jan/Feb(或者,更准确地说,是“月”以来该指数的水平)到其自己的列使用stack

df = df.stack('Month-Year') 

产量

          Actual Budget 
Expense Categories    Month-Year     
5600 Direct Personnel Expenses Feb   -290.57  NaN 
           Jan   2521.73  0.00 
6000 Automobile Expense  Feb   483.15  NaN 
           Jan   909.33 1314.00 
6160 Funeral Home Expense  Feb   0.00  NaN 
           Jan   1072.00 1800.02 
6400 Lab Expense    Feb   65.18  NaN 
           Jan   0.00  0.00 
6100 Marketing & Promotion  Feb   1158.41  NaN 
           Jan   543.13 1850.01 
2

meltpivot_table

df=df.melt('Expense Categories') 
df[['Month','Type']]=df.variable.str.split(' ',expand=True) 
df=pd.pivot_table(df,index=['Expense Categories','Month'],columns='Type',values='value').reset_index() 
df 

Out[1176]: 
Type    Expense Categories Month Actual Budget 
0  5600 Direct Personnel Expenses Feb -290.57  NaN 
1  5600 Direct Personnel Expenses Jan 2521.73  0.00 
2   6000 Automobile Expense Feb 483.15  NaN 
3   6000 Automobile Expense Jan 909.33 1314.00 
4   6100 Marketing & Promotion Feb 1158.41  NaN 
5   6100 Marketing & Promotion Jan 543.13 1850.01 
6   6160 Funeral Home Expense Feb  0.00  NaN 
7   6160 Funeral Home Expense Jan 1072.00 1800.02 
8     6400 Lab Expense Feb 65.18  NaN 
9     6400 Lab Expense Jan  0.00  0.00 

我们几乎到那里,然后

df['department']='PR' 
df['Direct/Indirect'] = 'Direct Expense' 
df['Month-Year'] = df['Month'] + str(2016) 
df 
Out[1182]: 
Type    Expense Categories Month Actual Budget department \ 
0  5600 Direct Personnel Expenses Feb -290.57  NaN   PR 
1  5600 Direct Personnel Expenses Jan 2521.73  0.00   PR 
2   6000 Automobile Expense Feb 483.15  NaN   PR 
3   6000 Automobile Expense Jan 909.33 1314.00   PR 
4   6100 Marketing & Promotion Feb 1158.41  NaN   PR 
5   6100 Marketing & Promotion Jan 543.13 1850.01   PR 
6   6160 Funeral Home Expense Feb  0.00  NaN   PR 
7   6160 Funeral Home Expense Jan 1072.00 1800.02   PR 
8     6400 Lab Expense Feb 65.18  NaN   PR 
9     6400 Lab Expense Jan  0.00  0.00   PR 
Type Direct/Indirect Month-Year 
0  Direct Expense Feb2016 
1  Direct Expense Jan2016 
2  Direct Expense Feb2016 
3  Direct Expense Jan2016 
4  Direct Expense Feb2016 
5  Direct Expense Jan2016 
6  Direct Expense Feb2016 
7  Direct Expense Jan2016 
8  Direct Expense Feb2016 
9  Direct Expense Jan2016 
+0

这是一个干净利落的方式! – unutbu

+0

@unutbu对你的回答感到惊讶:-) – Wen