2017-06-14 55 views
1

我有一个数据帧,看起来像选择一个组,改造剩余的组列熊猫

import pandas as pd 

from pandas.compat import StringIO 

origin = pd.read_table(StringIO('''label type value 
x a 1 
x b 2 
y a 4 
y b 5 
z a 7 
z c 9''')) 

origin 
Out[5]: 
    label type value 
0  x a  1 
1  x b  2 
2  y a  4 
3  y b  5 
4  z a  7 
5  z c  9 

我想把它改造成类似

label type value y_value z_value 
0  x a  1   4   7 
1  x b  2   5  NaN 

这里y_value和z_value根据类型决定。

回答

1

你可以使用boolean indexing的第一过滤 - 在df2也删除其不在df1['type']isin行,然后pivotadd_suffix和最后join

a = 'x' 
df1 = df[df['label'] == a] 
df2 = df[(df['label'] != a) & (df['type'].isin(df1['type']))] 
df3 = df2.pivot(index='type', columns='label', values='value').add_suffix('_value') 
print (df3) 
label y_value z_value 
type     
a   4.0  7.0 
b   5.0  NaN 

df3 = df1.join(df3, on='type') 
print (df3) 
    label type value y_value z_value 
0  x a  1  4.0  7.0 
1  x b  2  5.0  NaN 
0

您可以使用pivot_table

origin_temp = origin.pivot(index='type',columns='label',values='value') 

输出继电器:

type x y  z 
a 1.0 4.0 7.0 
b 2.0 5.0 NaN 
c NaN NaN 9.0 

过滤什么interrest你:

origin_temp = origin_temp.drop('c').reset_index() 
origin_temp = origin_temp.drop('x',axis=1) 
origin_temp = origin_temp[['y','z']] 
origin_temp.columns = [ i + '_value' for i in origin_temp] 

输出

y_value z_value 
0 4.0  7.0 
1 5.0  NaN 

然后过滤你想保持

origin_temp_2 = origin[origin['label'] == 'x' ] 

输出

label type value 
0 x  a  1 
1 x  b  2 

值最后Concat的两个:

origine_final = pd.concat([origin_temp, origin_temp_2],axis=1) 

输出

y_value z_value label type value 
0 4.0  7.0  x  a  1 
1 5.0  NaN  x  b  2