转换两个布尔列类ID在熊猫

我不得不布尔列：转换两个布尔列类ID在熊猫

df = pd.DataFrame([[True, True], 
        [True, False], 
        [False, True], 
        [True, True], 
        [False, False]], 
       columns=['col1', 'col2'])

我需要生成标识新列其独特的组合它们属于：

result = pd.Series([0, 1, 2, 0, 3])

好像应该有一个非常简单的方法来做到这一点，但它逃避了我。也许使用sklearn.preprocessing？简单的Pandas或Numpy解决方案同样是优选的。

编辑：将是非常好的，如果解决方案可以扩展到超过2列

来源

2017-03-04 Chris

的simpliest是创建tuples与factorize：

print (pd.Series(pd.factorize(df.apply(tuple, axis=1))[0])) 
0 0 
1 1 
2 2 
3 0 
4 3 
dtype: int64

与投另一种解决方案，以string和sum：

print (pd.Series(pd.factorize(df.astype(str).sum(axis=1))[0])) 
0 0 
1 1 
2 2 
3 0 
4 3 
dtype: int64

来源

2017-03-04 19:51:18 jezrael

这就是我一直在寻找。我知道那里有一个单线程。谢谢！ – Chris

谢谢。很高兴可以帮助你！祝你好运！ – jezrael

我以前从未使用过大熊猫，但这里是普通的Python，我敢肯定不会是一个解决方案很难适应大熊猫：

a = [[True, True], 
     [True, False], 
     [False, True], 
     [True, True], 
     [False, False]] 

ids, result = [], [] # ids, keeps a list of previously seen items. result, keeps the result 

for x in a: 
    if x in ids: # x has been seen before 
     id = ids.index(x) # find old id 
     result.append(id) 
    else: # x hasn't been seen before 
     id = len(ids) # create new id 
     result.append(id) 
     ids.append(x) 

print(result) # [0, 1, 2, 0, 3]

这适用于任意数量的列，得到的结果为一系列只需使用：

result = pd.Series(result)

来源

2017-03-04 19:48:58

转换两个布尔列类ID在熊猫

回答

相关问题