2017-08-09 55 views
1

我有一个sas proc转置我试图在熊猫中复制。Python Pandas Proc转置等效

下面是一个例子:

ID = ['ID1', 'ID1', 'ID1', 'ID1', 'ID1'] 
obs_week = [201701,201701,201701,201701,201701] 
weeks_id = [1,2,3,4,5] 
spend = [100,200,300,400,500] 
df = pd.DataFrame(zip(ID, obs_week, weeks_id, spend), columns = ['id', 'obs_week', 'weeks_id', 'spend']) 
df 

这给了像这样的表:

id obs_week weeks_id spend 
0 ID1 201701  1   100 
1 ID1 201701  2   200 
2 ID1 201701  3   300 
3 ID1 201701  4   400 
4 ID1 201701  5   500 

我想转这让ID1和obs_week成为独特然后weeks_id成为新带有前缀的列。

的SAS代码如下所示:

proc transpose data=spend out=spend_hh (drop = _label_ _name_) prefix=spend_; 
    by id obs_week; 
    id weeks_id; 
    var spend; 
run; 

我设法去接近使用df.pivot_table

df.pivot_table(index=['id','obs_week'], columns='weeks_id', aggfunc=sum, fill_value=0) 

给人一种表像这样

    spend 
weeks_id   1 2 3 4 5 
id  obs_week     
ID1  201701 100 200 300 400 500 

我的问题是我想重新命名为1 2 3 4 5将是spend_1,spend_2等

我也想在文件中多个不同的变量来做到这一点,但我想我可以只限制选择只是田野,我想

我的回答应该是这样的:

id obs_week spend_1 spend_2 spend_3 spend_4 spend_5 
0 ID1 201701  100  200  300  400  500 

是这只是以某种方式折叠标题?

我也希望id和obs_week不要成为索引的一部分。

回答

0

你需要列表理解为索引和rename_axis列第一,然后创建列名reset_index用于删除weeks_id文本:

df = df.pivot_table(index=['id','obs_week'], columns='weeks_id', aggfunc=sum, fill_value=0) 

df.columns = ['{}_{}'.format(x[0], x[1]) for x in df.columns] 
df = df.reset_index().rename_axis(None, axis=1) 
print (df) 
    id obs_week spend_1 spend_2 spend_3 spend_4 spend_5 
0 ID1 201701  100  200  300  400  500 

或者:

df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns] 
df = df.reset_index().rename_axis(None, axis=1) 
print (df) 
    id obs_week spend_1 spend_2 spend_3 spend_4 spend_5 
0 ID1 201701  100  200  300  400  500 
1

这里有一个单衬

In [1446]: (df.pivot_table(index=['id', 'obs_week'], columns=['weeks_id'], values='spend') 
       .add_prefix('spend_') 
       .reset_index()) 
Out[1446]: 
weeks_id id obs_week spend_1 spend_2 spend_3 spend_4 spend_5 
0   ID1 201701  100  200  300  400  500 

In [1449]: (df.pivot_table(index=['id', 'obs_week'], columns=['weeks_id'], values='spend') 
       .add_prefix('spend_') 
       .reset_index() 
       .rename_axis(None, axis=1)) 
Out[1449]: 
    id obs_week spend_1 spend_2 spend_3 spend_4 spend_5 
0 ID1 201701  100  200  300  400  500