填充大熊猫数据帧丢失的行

data = { 
    'node1': [1, 1,1, 2,2,5], 
'node2': [8,16,22,5,25,10], 
'weight': [1,1,1,1,1,1], } 
df = pd.DataFrame(data, columns = ['node1','node2','weight']) 

df2=df.assign(Cu=df.groupby('node1').cumcount()).set_index('Cu').groupby('node1') \ 
     .apply(lambda x : x['node2']).unstack('Cu').fillna(np.nan)

输出：填充大熊猫数据帧丢失的行

1  8.0 16.0  22.0 
2  5.0 25.0  0.0 
5  10.0 0.0  0.0

这个输出，我流汗，但我需要的输出：

这是缺少像数据的行3,4应该有列作为零

来源

2017-10-07 Dev_123

为什么你问同样的问题？ – Wen

他几乎没有办法做到这一点。

选项1

In [36]: idx = np.arange(df.node1.min(), df.node1.max()+1) 

In [37]: df.groupby('node1')['node2'].apply(list).apply(pd.Series).reindex(idx).fillna(0) 
Out[37]: 
      0  1  2 
node1 
1  8.0 16.0 22.0 
2  5.0 25.0 0.0 
3  0.0 0.0 0.0 
4  0.0 0.0 0.0 
5  10.0 0.0 0.0

选项2

In [39]: (df.groupby('node1')['node2'].apply(lambda x: pd.Series(x.values)) 
      .unstack().reindex(idx).fillna(0)) 
Out[39]: 
      0  1  2 
node1 
1  8.0 16.0 22.0 
2  5.0 25.0 0.0 
3  0.0 0.0 0.0 
4  0.0 0.0 0.0 
5  10.0 0.0 0.0

选项3

In [55]: pd.DataFrame.from_dict(
       {i: x.values for i, x in df.groupby('node1')['node2']}, 
       orient='index').reindex(idx).fillna(0) 
Out[55]: 
     0  1  2 
1 8.0 16.0 22.0 
2 5.0 25.0 0.0 
3 0.0 0.0 0.0 
4 0.0 0.0 0.0 
5 10.0 0.0 0.0

然后，根据您的用例来衡量效率，可读性。

来源

2017-10-07 13:07:32 Zero

In [15]: idx = np.arange(df.node1.min(), df.node1.max()+1) 

In [16]: df.pivot_table(index='node1', 
         columns=df.groupby('node1').cumcount(), 
         values='node2', 
         fill_value=0) \ 
      .reindex(idx) \ 
      .fillna(0) 
Out[16]: 
      0  1  2 
node1 
1  8.0 16.0 22.0 
2  5.0 25.0 0.0 
3  0.0 0.0 0.0 
4  0.0 0.0 0.0 
5  10.0 0.0 0.0

来源

2017-10-07 12:59:36 MaxU

填充大熊猫数据帧丢失的行

回答

相关问题