将条件行数据合并到新的数据帧

我有一个从csv读取的数据帧。将条件行数据合并到新的数据帧

  time node txrx src dest txid hops 
0  34355146  2 TX 2  1  1 NaN 
1  34373907  1 RX 2  1  1 1.0 
2  44284813  2 TX 2  1  2 NaN 
3  44302557  1 RX 2  1  2 1.0 
4  44596500  3 TX 3  1  2 NaN 
5  44630682  1 RX 3  1  2 2.0 
6  50058251  2 TX 2  1  3 NaN 
7  50075994  1 RX 2  1  3 1.0 
8  51338658  3 TX 3  1  3 NaN 
9  51382629  1 RX 3  1  3 2.0

我需要能够创建一个新的数据帧这需要在TX/RX行中的值，创建一个单独的行，每对：

花点时间从“时间'栏。如果'txrx'中的值是'TX'，则将其放入'tx_time'列中，如果该值为“RX”，则将该值放入'rx_time'列（在新数据帧的行内）。
“啤酒花”的值取自RX行。
这是为每个['src'，'dest'，'txid']组完成的。
“节点”列被忽略。然后

东风应该是这样的：

 tx_time rx_time src dest txid hops 
0 34355146 34373907 2  1  1  1 
1 44284813 44302557 2  1  2  1 
2 44596500 44630682 3  1  2  2 
3 50058251 50075994 2  1  3  1 
4 51338658 51382629 3  1  3  2

我明白怎么做步骤（3），但是我被困在如何尝试了一下（1）和（2）。建议吗？

来源

2017-10-10 mbadd

我已经从@Wen的_pivot_table_的解决方案，但_defaultdict_和piRSquared和费沙_concat_方法也都做的工作。我敢肯定，有一个关于哪个更有效的讨论:) – mbadd

通过使用pivot_table

df.bfill().pivot_table(index=['src','dest','txid','hops'],columns=['txrx'],values='time').reset_index() 
Out[766]: 
txrx src dest txid hops  RX  TX 
0  2  1  1 1.0 34373907 34355146 
1  2  1  2 1.0 44302557 44284813 
2  2  1  3 1.0 50075994 50058251 
3  3  1  2 2.0 44630682 44596500 
4  3  1  3 2.0 51382629 51338658

或者使用unstack

df.bfill().set_index(['src','dest','txid','hops','txrx']).time.unstack(-1).reset_index() 
Out[768]: 
txrx src dest txid hops  RX  TX 
0  2  1  1 1.0 34373907 34355146 
1  2  1  2 1.0 44302557 44284813 
2  2  1  3 1.0 50075994 50058251 
3  3  1  2 2.0 44630682 44596500 
4  3  1  3 2.0 51382629 51338658

PS：使用.rename(columns={})我没加这里，因为会使得代码过长重命名......

来源

2017-10-10 15:16:28 Wen

unpack的默认级别是-1。没有必要通过它。 – piRSquared

@piRSquared明白了！ :-) – Wen

pivot_table非常好，谢谢.rename（）提示！ – mbadd

尽管使用concat，但我认为@Wen使用数据透视的解决方案会更有效率

df_tx = df[::2].reset_index().drop(['index', 'txrx', 'node'], axis = 1).rename(columns = {'time': 'tx_time'}) 
df_rx = df[1::2].reset_index().drop(['index', 'txrx', 'node'], axis = 1).rename(columns = {'time': 'rx_time'}) 

pd.concat([df_tx, df_rx ], axis = 1).T.drop_duplicates().T.dropna(1)

你得到

tx_time  src dest txid rx_time  hops 
0 34355146.0 2.0 1.0  1.0  34373907.0 1.0 
1 44284813.0 2.0 1.0  2.0  44302557.0 1.0 
2 44596500.0 3.0 1.0  2.0  44630682.0 2.0 
3 50058251.0 2.0 1.0  3.0  50075994.0 1.0 
4 51338658.0 3.0 1.0  3.0  51382629.0 2.0

来源

2017-10-10 15:32:42 Vaishali

一个defaultdict方法
这实际上可能会更快的OP的目的。
如果速度很重要，请检查。因人而异。

from collections import defaultdict 

d = defaultdict(lambda: defaultdict(dict)) 
cols = 'tx_time rx_time src dest txid hops'.split() 

for t in df.itertuples(): 
    i = (t.src, t.dest, t.txid) 
    d[t.txrx.lower() + '_time'][i] = t.time 
    if pd.notnull(t.hops): 
     d['hops'][i] = int(t.hops) 

pd.DataFrame(d).rename_axis(['src', 'dest', 'txid']) \ 
    .reset_index().reindex_axis(cols, 1) 

    tx_time rx_time src dest txid hops 
0 34355146 34373907 2  1  1  1 
1 44284813 44302557 2  1  2  1 
2 50058251 50075994 2  1  3  1 
3 44596500 44630682 3  1  2  2 
4 51338658 51382629 3  1  3  2

来源

2017-10-10 16:33:44 piRSquared

感谢您的解决方案。在这种情况下，速度并不重要（它只是重新排列表格，所以我可以绘制它），所以pivot_table更容易一些。但是，如果我做任何实时处理，我都会记住这一点。 – mbadd

将条件行数据合并到新的数据帧

回答

相关问题