2016-12-14 83 views
0

我有这样一个数据帧:如何根据多个索引来排列数据帧?

Date Shift Machine_number production 
9/1/2016 C 1 0.795578112 
9/1/2016 C 2 0.40730688    
9/1/2016 C 3 0.41150592 
9/1/2016 C 4 0.40310784    
9/1/2016 C 5 0.376233984 
9/2/2016 A 1 0.470486495    
9/2/2016 A 2 0.41360544 
9/2/2016 A 3 0.41780448 
9/2/2016 A 4 0.40520736    
9/2/2016 A 5 0.329204736 
9/2/2016 B 1 0.472911683    
9/2/2016 B 2 0.4094064 
9/2/2016 B 3 0.4094064    
9/2/2016 B 4 0.41570496 
9/2/2016 B 5 0.366436224 

我想创建一个数据帧有多个指数:

Date Machine No. Shift production 
9/1/2016 1 c 0.795578112 
9/2/2016 1 a 0.470486495 
9/2/2016 1 b 0.472911683 

感谢。

我想: idx0=np.array(df['Machine_number']) idx1 = np.array(df['Shift']) df2 = DataFrame(index = [idx0,idx1], columns = df["production"])

+0

OK,你尝试什么吗? – EdChum

+0

@EdChum,是的,我试过了。 – Dheeraj

+0

不要在评论中张贴代码,编辑您的问题 – EdChum

回答

1

我想你需要set_index

#by 2 columns 
df = df.set_index(['Machine_number','Shift']) 
print (df) 
          Date production 
Machine_number Shift      
1    C  9/1/2016 0.795578 
2    C  9/1/2016 0.407307 
3    C  9/1/2016 0.411506 
4    C  9/1/2016 0.403108 
5    C  9/1/2016 0.376234 
1    A  9/2/2016 0.470486 
2    A  9/2/2016 0.413605 
3    A  9/2/2016 0.417804 
4    A  9/2/2016 0.405207 
5    A  9/2/2016 0.329205 
1    B  9/2/2016 0.472912 
2    B  9/2/2016 0.409406 
3    B  9/2/2016 0.409406 
4    B  9/2/2016 0.415705 
5    B  9/2/2016 0.366436 
#by 2 columns and filter another columns by subset 
df = df.set_index(['Machine_number','Shift'])[['production']] 
print (df) 
         production 
Machine_number Shift    
1    C  0.795578 
2    C  0.407307 
3    C  0.411506 
4    C  0.403108 
5    C  0.376234 
1    A  0.470486 
2    A  0.413605 
3    A  0.417804 
4    A  0.405207 
5    A  0.329205 
1    B  0.472912 
2    B  0.409406 
3    B  0.409406 
4    B  0.415705 
5    B  0.366436 

#by 3 columns 
df = df.set_index(['Date', 'Machine_number','Shift']) 
print (df) 
           production 
Date  Machine_number Shift    
9/1/2016 1    C  0.795578 
     2    C  0.407307 
     3    C  0.411506 
     4    C  0.403108 
     5    C  0.376234 
9/2/2016 1    A  0.470486 
     2    A  0.413605 
     3    A  0.417804 
     4    A  0.405207 
     5    A  0.329205 
     1    B  0.472912 
     2    B  0.409406 
     3    B  0.409406 
     4    B  0.415705 
     5    B  0.366436 

sort_values解决方案一:

df = df.sort_values(['Machine_number','Shift'], ascending=[True,False]) 
     .reset_index(drop=True) 
#if need change order of columns 
df = df[['Date','Machine_number','Shift','production']] 
print (df) 
     Date Machine_number Shift production 
0 9/1/2016    1  C 0.795578 
1 9/2/2016    1  B 0.472912 
2 9/2/2016    1  A 0.470486 
3 9/1/2016    2  C 0.407307 
4 9/2/2016    2  B 0.409406 
5 9/2/2016    2  A 0.413605 
6 9/1/2016    3  C 0.411506 
7 9/2/2016    3  B 0.409406 
8 9/2/2016    3  A 0.417804 
9 9/1/2016    4  C 0.403108 
10 9/2/2016    4  B 0.415705 
11 9/2/2016    4  A 0.405207 
12 9/1/2016    5  C 0.376234 
13 9/2/2016    5  B 0.366436 
14 9/2/2016    5  A 0.329205 

如果为了需要改变C, A, B使用ordered Categorical,并设置顺序参数categories

df.Shift = df.Shift.astype('category', ordered=True, categories=['C','A','B']) 
df = df.sort_values(['Machine_number','Shift']).reset_index(drop=True) 
print (df) 
     Date Shift Machine_number production 
0 9/1/2016  C    1 0.795578 
1 9/2/2016  A    1 0.470486 
2 9/2/2016  B    1 0.472912 
3 9/1/2016  C    2 0.407307 
4 9/2/2016  A    2 0.413605 
5 9/2/2016  B    2 0.409406 
6 9/1/2016  C    3 0.411506 
7 9/2/2016  A    3 0.417804 
8 9/2/2016  B    3 0.409406 
9 9/1/2016  C    4 0.403108 
10 9/2/2016  A    4 0.405207 
11 9/2/2016  B    4 0.415705 
12 9/1/2016  C    5 0.376234 
13 9/2/2016  A    5 0.329205 
14 9/2/2016  B    5 0.366436 
+0

在您的第一篇文章中,您确实喜欢我的预期产出,但我无法找到它。 – Dheeraj

+0

我将其添加回答。 – jezrael

+0

jezrael谢谢 – Dheeraj