2017-05-05 67 views
3

使用熊猫,如何能继DataFrame蟒,熊猫 - 遍历键值列分成多个列

In [1]: import pandas as pd 
In [2]: pd.DataFrame({'month': [1, 1, 1, 2, 2, 3, 3], 
         'type': ["T1", "T1", "T4", "T2", "T3", "T1", "T3"], 
         'value': [10, 40, 20, 30, 10, 40, 50]}) 
Out[2]: 
    month type value 
0  1 T1  10 
1  1 T1  40 
2  1 T4  20 
3  2 T2  30 
4  2 T3  10 
5  3 T1  40 
6  3 T3  50 

进行处理以产生下面的结果?

Out[3]: 
    T1 T2 T3 T4 month 
0 10 0 0 0  1 
1 40 0 0 0  1 
2 0 0 0 20  1 
3 0 30 0 0  2 
4 0 0 10 0  2 
5 40 0 0 0  3 
6 0 0 50 0  3 

回答

4

pandas
巧妙地利用pd.get_dummies

pd.get_dummies(df.type).mul(df.value, 0).join(df.month) 

    T1 T2 T3 T4 month 
0 10 0 0 0  1 
1 40 0 0 0  1 
2 0 0 0 20  1 
3 0 30 0 0  2 
4 0 0 10 0  2 
5 40 0 0 0  3 
6 0 0 50 0  3 

numpy
或者同样的想法,但超级充电

u, inv = np.unique(df.type.values, return_inverse=True) 
eye = np.eye(u.size, dtype=int) 
v = df.value.values 
m = df.month.values 
pd.DataFrame(
    np.column_stack([eye[inv] * v[:, None], m]), 
    df.index, np.append(u, 'month') 
) 

    T1 T2 T3 T4 month 
0 10 0 0 0  1 
1 40 0 0 0  1 
2 0 0 0 20  1 
3 0 30 0 0  2 
4 0 0 10 0  2 
5 40 0 0 0  3 
6 0 0 50 0  3 

定时

%timeit pd.get_dummies(df.type).mul(df.value, 0).join(df.month) 
1000 loops, best of 3: 1.1 ms per loop 

%%timeit 
u, inv = np.unique(df.type.values, return_inverse=True) 
eye = np.eye(u.size, dtype=int) 
v = df.value.values 
m = df.month.values 
pd.DataFrame(
    np.column_stack([eye[inv] * v[:, None], m]), 
    df.index, np.append(u, 'month') 
) 
10000 loops, best of 3: 189 µs per loop 

%%timeit 
(df.set_index(['type'],append=True)['value'] 
    .unstack(fill_value=0)).join(df[['month']]) 
100 loops, best of 3: 1.92 ms per loop 

%%timeit 
d1 = df.set_index(['month','type'], append=True)['value'] \ 
     .unstack(fill_value=0) \ 
     .reset_index(level=1) \ 

cols = d1.columns[1:].tolist() + d1.columns[:1].tolist() 
d1 = d1.reindex_axis(cols, axis=1) 
d1 
100 loops, best of 3: 2.48 ms per loop 
+0

在我看来很聪明。 – jezrael

+0

@jezrael谢谢你! – piRSquared

+0

@piRSquared,这真的很快! – MaxU

3

您可以使用组合的​​和unstack得到T1 - T4列,然后在这样的月份列连接:

(df.set_index(['type'],append=True)['value'] 
    .unstack(fill_value=0)).join(df[['month']]) 
# T1 T2 T3 T4 month 
# 0 10 0 0 0  1 
# 1 40 0 0 0  1 
# 2 0 0 0 20  1 
# 3 0 30 0 0  2 
# 4 0 0 10 0  2 
# 5 40 0 0 0  3 
# 6 0 0 50 0  3 
2

您可以使用set_indexunstackreset_index。最后列的变化顺序添加reindex_axis

df = df.set_index(['month','type'], append=True)['value'] 
     .unstack(fill_value=0) 
     .reset_index(level=1) 
#reorder columns 
cols = df.columns[1:].tolist() + df.columns[:1].tolist() 
df = df.reindex_axis(cols, axis=1) 
print (df) 
type T1 T2 T3 T4 month 
0  10 0 0 0  1 
1  40 0 0 0  1 
2  0 0 0 20  1 
3  0 30 0 0  2 
4  0 0 10 0  2 
5  40 0 0 0  3 
6  0 0 50 0  3