numpy将分类字符串数组转换为整数数组

我想将分类变量的字符串数组转换为分类变量的整数数组。numpy将分类字符串数组转换为整数数组

Ex。

import numpy as np 
a = np.array(['a', 'b', 'c', 'a', 'b', 'c']) 
print a.dtype 
>>> |S1 

b = np.unique(a) 
print b 
>>> ['a' 'b' 'c'] 

c = a.desired_function(b) 
print c, c.dtype 
>>> [1,2,3,1,2,3] int32

我意识到这可以用循环完成，但我想有一个更简单的方法。谢谢。

来源

2010-07-03 wroscoe

嗯，这是一个黑客......但它有帮助吗？

In [72]: c=(a.view(np.ubyte)-96).astype('int32') 

In [73]: print(c,c.dtype) 
(array([1, 2, 3, 1, 2, 3]), dtype('int32'))

来源

2010-07-03 19:15:51 unutbu

你认真想添加的警告，这种方式只适用于长度为1的字符串。 – smci 2013-07-23 12:23:39

一种方法是使用categorical功能从scikits.statsmodels。例如：

In [60]: from scikits.statsmodels.tools import categorical 

In [61]: a = np.array(['a', 'b', 'c', 'a', 'b', 'c']) 

In [62]: b = categorical(a, drop=True) 

In [63]: b.argmax(1) 
Out[63]: array([0, 1, 2, 0, 1, 2])

从categorical（b）的返回值实际上是一个设计矩阵，因此调用以上argmax得到它接近你想要的格式。

In [64]: b 
Out[64]: 
array([[ 1., 0., 0.], 
     [ 0., 1., 0.], 
     [ 0., 0., 1.], 
     [ 1., 0., 0.], 
     [ 0., 1., 0.], 
     [ 0., 0., 1.]])

来源

2010-07-10 05:12:10 ars

整洁而聪明。谢谢。 – unutbu 2010-07-10 11:34:04

np.unique具有一些可选返回

return_inverse给出整数编码，我使用非常经常

>>> b, c = np.unique(a, return_inverse=True) 
>>> b 
array(['a', 'b', 'c'], 
     dtype='|S1') 
>>> c 
array([0, 1, 2, 0, 1, 2]) 
>>> c+1 
array([1, 2, 3, 1, 2, 3])

它可以用于重新创建从唯一身份原数组

>>> b[c] 
array(['a', 'b', 'c', 'a', 'b', 'c'], 
     dtype='|S1') 
>>> (b[c] == a).all() 
True

来源

2010-07-14 20:24:54 user333700

......年后......

为了完整（因为这不是在答案中提到）和个人原因（我总是有pandas进口在我的模块，但不一定sklearn），这也与pandas.get_dummies()

import numpy as np 
import pandas 

In [1]: a = np.array(['a', 'b', 'c', 'a', 'b', 'c']) 

In [2]: b = pandas.get_dummies(a) 

In [3]: b 
Out[3]: 
     a b c 
    0 1 0 0 
    1 0 1 0 
    2 0 0 1 
    3 1 0 0 
    4 0 1 0 
    5 0 0 1 

In [3]: b.values.argmax(1) 
Out[4]: array([0, 1, 2, 0, 1, 2])

来源

2015-09-01 17:27:10 benjaminmgross

谢谢。终于找到了我正在寻找的答案。 – SeeTheC 2017-04-07 09:05:50

相当简单另一种方法是使用熊猫factorize项映射到数字：

In [1]: import numpy as np 
In [2]: import pandas as pd 
In [3]: a = np.array(['a', 'b', 'c', 'a', 'b', 'c']) 
In [4]: a_enc = pd.factorize(a) 
In [5]: a_enc[0] 
Out[5]: array([0, 1, 2, 0, 1, 2]) 
In [6]: a_enc[1] 
Out[6]: array(['a', 'b', 'c'], dtype=object)

来源

2016-05-09 18:08:41 tomp

...一些年过......

以为我会为了完整性提供一个纯Python的解决方案：

def count_unique(a): 
    def counter(item, c=[0], items={}): 
     if item not in items: 
      items[item] = c[0] 
      c[0] += 1 
     return items[item] 
    return map(counter, a) 

a = [0, 2, 6, 0, 2] 
print count_unique(a) 
>> [0, 1, 2, 0, 1]

来源

2017-09-21 11:27:40 kezzos

numpy将分类字符串数组转换为整数数组

回答

相关问题