我认为你需要numpy.select
- 它首先选择True
值和所有其他都不重要:
m1 = df['feature1']==1
m2 = df['feature2']==1
m3 = df['feature3']==1
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
样品:
customer_id = [1,2,3,4,5,6,7,8,9,10]
feature1 = [0,0,1,1,0,0,1,1,0,0]
feature2 = [1,0,1,0,1,0,1,0,1,0]
feature3 = [0,0,1,0,0,0,1,0,0,0]
df = pd.DataFrame({'customer_id':customer_id,
'feature1':feature1,
'feature2':feature2,
'feature3':feature3})
m1 = df['feature1']==1
m2 = df['feature2']==1
m3 = df['feature3']==1
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
print (df)
customer_id feature1 feature2 feature3 new_var
0 1 0 1 0 2
1 2 0 0 0 4
2 3 1 1 1 1
3 4 1 0 0 1
4 5 0 1 0 2
5 6 0 0 0 4
6 7 1 1 1 1
7 8 1 0 0 1
8 9 0 1 0 2
9 10 0 0 0 4
如果features
仅1
和0
可转换0
到False
和1
到True
:
m1 = df['feature1'].astype(bool)
m2 = df['feature2'].astype(bool)
m3 = df['feature3'].astype(bool)
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
print (df)
customer_id feature1 feature2 feature3 new_var
0 1 0 1 0 2
1 2 0 0 0 4
2 3 1 1 1 1
3 4 1 0 0 1
4 5 0 1 0 2
5 6 0 0 0 4
6 7 1 1 1 1
7 8 1 0 0 1
8 9 0 1 0 2
9 10 0 0 0 4
只是为了某种回答我的问题:我刚才提到的东西我也试过np.where解决方案的工作 - 在因为它没有给我正确的结果是因为feature1的数据类型是字符串,而不是整数..所以对于任何寻找类似问题的人来说,'nested np.where'解决方案和'numpy.select'解决方案jezrael提到作品 – Shraddha