2017-08-10 111 views
1

我有一个数据帧:np.where多个变量

customer_id [1,2,3,4,5,6,7,8,9,10] 
feature1 [0,0,1,1,0,0,1,1,0,0] 
feature2 [1,0,1,0,1,0,1,0,1,0] 
feature3 [0,0,1,0,0,0,1,0,0,0] 

使用此我想创建一个新的变量(比如说new_var)的说法,当特征1是1,则new_var = 1,如果feature_2 = 1 then new_var = 2,feature3 = 1然后new_var = 3 else 4.我正在尝试np.where,但虽然它不会给我一个错误,但它没有做正确的事情 - 所以我想嵌套的np .where仅适用于单个变量。在这种情况下,在熊猫中执行嵌套if/case的最佳方法是什么?

我np.where代码是这样的:

df[new_var]=np.where(df['feature1']==1,'1', np.where(df['feature2']==1,'2', np.where(df[feature3']==1,'3','4'))) 
+0

只是为了某种回答我的问题:我刚才提到的东西我也试过np.where解决方案的工作 - 在因为它没有给我正确的结果是因为feature1的数据类型是字符串,而不是整数..所以对于任何寻找类似问题的人来说,'nested np.where'解决方案和'numpy.select'解决方案jezrael提到作品 – Shraddha

回答

1

我认为你需要numpy.select - 它首先选择True值和所有其他都不重要:

m1 = df['feature1']==1 
m2 = df['feature2']==1  
m3 = df['feature3']==1 
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4') 

样品

customer_id = [1,2,3,4,5,6,7,8,9,10] 
feature1 = [0,0,1,1,0,0,1,1,0,0] 
feature2 = [1,0,1,0,1,0,1,0,1,0] 
feature3 = [0,0,1,0,0,0,1,0,0,0] 

df = pd.DataFrame({'customer_id':customer_id, 
        'feature1':feature1, 
        'feature2':feature2, 
        'feature3':feature3}) 

m1 = df['feature1']==1 
m2 = df['feature2']==1  
m3 = df['feature3']==1 
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4') 
print (df) 
    customer_id feature1 feature2 feature3 new_var 
0   1   0   1   0  2 
1   2   0   0   0  4 
2   3   1   1   1  1 
3   4   1   0   0  1 
4   5   0   1   0  2 
5   6   0   0   0  4 
6   7   1   1   1  1 
7   8   1   0   0  1 
8   9   0   1   0  2 
9   10   0   0   0  4 

如果features10可转换0False1True

m1 = df['feature1'].astype(bool) 
m2 = df['feature2'].astype(bool) 
m3 = df['feature3'].astype(bool) 
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4') 
print (df) 
    customer_id feature1 feature2 feature3 new_var 
0   1   0   1   0  2 
1   2   0   0   0  4 
2   3   1   1   1  1 
3   4   1   0   0  1 
4   5   0   1   0  2 
5   6   0   0   0  4 
6   7   1   1   1  1 
7   8   1   0   0  1 
8   9   0   1   0  2 
9   10   0   0   0  4 
+0

谢谢@jezrael - 似乎工作得很好,如果我尝试这个例子,但不是在我的代码,我想弄清楚为什么。此外,这是一种解决方案,当功能1,2,3仅为第一个值(例如第3行)时,它们都不为1的情况。 – Shraddha

+1

现在工作!我有0/1作为字符串,这就是为什么它每次都返回默认值4。谢谢! – Shraddha

+0

很高兴能帮到你!美好的一天! – jezrael