如何订购并保留常用指标从两个DataFrames

我有两个DataFrames：如何订购并保留常用指标从两个DataFrames

import pandas as pd 
import io 
from scipy import stats 


ctrl=u"""probegenes,sample1,sample2,sample3 
1415777_at Pnliprp1,20,0.00,11 
1415884_at Cela3b,47,0.00,100 
1415805_at Clps,17,0.00,55 
1115805_at Ckkk,77,10.00,5.5 
""" 

df_ctrl = pd.read_csv(io.StringIO(ctrl),index_col='probegenes') 

test=u"""probegenes,sample1,sample2,sample3 
1415777_at Pnliprp1,20.1,10.00,22.3 
1415805_at Clps,7,3.00,1.5 
1415884_at Cela3b,47,2.01,30""" 

df_test = pd.read_csv(io.StringIO(test),index_col='probegenes')

他们看起来像这样：

In [35]: df_ctrl 
Out[35]: 
        sample1 sample2 sample3 
probegenes 
1415777_at Pnliprp1  20  0  11.0 
1415884_at Cela3b   47  0 100.0 
1415805_at Clps   17  0  55.0 
1115805_at Ckkk   77  10  5.5 

In [36]: df_test 
Out[36]: 
        sample1 sample2 sample3 
probegenes 
1415777_at Pnliprp1  20.1 10.00  22.3 
1415805_at Clps   7.0  3.00  1.5 
1415884_at Cela3b  47.0  2.01  30.0

我想：

获取共同index为DataFrame
Reo同样地DataFrame。

因此，最后我得到两个新的DataFrame：

new_df_ctrl 

        sample1 sample2 sample3 
probegenes 
1415884_at Cela3b   47  0 100.0 
1415805_at Clps   17  0  55.0 
1415777_at Pnliprp1  20  0  11.0 


new_df_test 

        sample1 sample2 sample3 
probegenes 
1415884_at Cela3b  47.0  2.01  30.0 
1415805_at Clps   7.0  3.00  1.5 
1415777_at Pnliprp1  20.1 10.00  22.3

来源

2016-05-18 neversaint

您可以使用join与参数how='inner'得到共同指标。然后，使用这个通用索引重新编制每个数据帧的索引。

idx = df_ctrl.join(df_test, rsuffix='_', how='inner').index 

>>> df_ctrl.reindex(idx) 
        sample1 sample2 sample3 
probegenes          
1415777_at Pnliprp1  20  0  11 
1415805_at Clps   17  0  55 
1415884_at Cela3b   47  0  100 

>>> df_test.reindex(idx) 
        sample1 sample2 sample3 
probegenes          
1415777_at Pnliprp1  20.1 10.00  22.3 
1415805_at Clps   7.0  3.00  1.5 
1415884_at Cela3b  47.0  2.01  30.0

来源

2016-05-18 02:56:01 Alexander

你可以使用pd.Index.intersection()，并选择使用.loc[]或.reindex()。在index上使用.sort_values()以获得所需的排序：

idx = df_ctrl.index.intersection(df_test.index).sort_values(ascending=False) 

df_ctrl.loc[idx] 

        sample1 sample2 sample3 
probegenes          
1415884_at Cela3b   47  0.0 100.0 
1415805_at Clps   17  0.0  55.0 
1415777_at Pnliprp1  20  0.0  11.0 

df_test.loc[idx] 

        sample1 sample2 sample3 
probegenes          
1415884_at Cela3b  47.0  2.01  30.0 
1415805_at Clps   7.0  3.00  1.5 
1415777_at Pnliprp1  20.1 10.00  22.3

来源

2016-05-18 03:03:23 Stefan

如何订购并保留常用指标从两个DataFrames

回答

相关问题