2017-04-01 103 views
4

我希望有一个函数,它可以获取任意长度的条件列表,并在所有条件之间放置一个&符号。下面的示例代码。使用条件列表来过滤Pandas中的DataFrame

df = pd.DataFrame(columns=['Sample', 'DP','GQ', 'AB'], 
     data=[ 
       ['HG_12_34', 200, 35, 0.4], 
       ['HG_12_34_2', 50, 45, 0.9], 
       ['KD_89_9', 76, 67, 0.7], 
       ['KD_98_9_2', 4, 78, 0.02], 
       ['LG_3_45', 90, 3, 0.8], 
       ['LG_3_45_2', 15, 12, 0.9] 
       ]) 


def some_func(df, cond_list): 

    # wrap ampersand between multiple conditions 
    all_conds = ? 

    return df[all_conds] 

cond1 = df['DP'] > 40 
cond2 = df['GQ'] > 40 
cond3 = df['AB'] < 0.4 


some_func(df, [cond1, cond2]) # should return df[cond1 & cond2] 
some_func(df, [cond1, cond3, cond2]) # should return df[cond1 & cond3 & cond2] 

我将不胜感激任何帮助。

回答

5

您可以使用该functools.reduce

from functools import reduce 

def some_func(df, cond_list): 
    return df[reduce(lambda x,y: x&y, cond_list)]

或者,像@AryaMcCarthy说,你可以使用and_从运营商包:

from functools import reduce 
from operator import and_ 

def some_func(df, cond_list): 
    return df[reduce(and_, cond_list)]

或numpy的 - 像@ayhan说 - 其中有还有一个逻辑和缩减:

from numpy import logical_and 

def some_func(df, cond_list): 
    return df[logical_and.reduce(cond_list)]

所有三个版本制作 - 为您样品输入 - 以下的输出:

>>> some_func(df, [cond1, cond2]) 
     Sample DP GQ AB 
1 HG_12_34_2 50 45 0.9 
2  KD_89_9 76 67 0.7 
>>> some_func(df, [cond1, cond2, cond3]) 
Empty DataFrame 
Columns: [Sample, DP, GQ, AB] 
Index: [] 
+0

可能是更好的使用'operator.and_',而不是您的自定义拉姆达。 –

+0

@AryaMcCarthy:是的,这确实比较整齐。 –

+2

或者,从numpy:'np.logical_and.reduce([cond1,cond2,cond3])' – ayhan

相关问题