将2D numpy阵列转换为结构化数组

我试图将二维数组转换为具有命名字段的结构化数组。我希望2D数组中的每一行都是结构化数组中的新记录。不幸的是，我所尝试过的任何事情都是按我期望的方式工作。将2D numpy阵列转换为结构化数组

我开始有：

>>> myarray = numpy.array([("Hello",2.5,3),("World",3.6,2)]) 
>>> print myarray 
[['Hello' '2.5' '3'] 
['World' '3.6' '2']]

我要转换的东西，看起来像这样：

>>> newarray = numpy.array([("Hello",2.5,3),("World",3.6,2)], dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")]) 
>>> print newarray 
[('Hello', 2.5, 3L) ('World', 3.6000000000000001, 2L)]

我已经试过：

>>> newarray = myarray.astype([("Col1","S8"),("Col2","f8"),("Col3","i8")]) 
>>> print newarray 
[[('Hello', 0.0, 0L) ('2.5', 0.0, 0L) ('3', 0.0, 0L)] 
[('World', 0.0, 0L) ('3.6', 0.0, 0L) ('2', 0.0, 0L)]] 

>>> newarray = numpy.array(myarray, dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")]) 
>>> print newarray 
[[('Hello', 0.0, 0L) ('2.5', 0.0, 0L) ('3', 0.0, 0L)] 
[('World', 0.0, 0L) ('3.6', 0.0, 0L) ('2', 0.0, 0L)]]

两者的这些方法试图将myarray中的每个条目转换为给定dtype的记录，因此会插入额外的零。我无法弄清楚如何让它将每一行转换成一条记录。

的另一种尝试：

>>> newarray = myarray.copy() 
>>> newarray.dtype = [("Col1","S8"),("Col2","f8"),("Col3","i8")] 
>>> print newarray 
[[('Hello', 1.7219343871178711e-317, 51L)] 
[('World', 1.7543139673493688e-317, 50L)]]

这个时候不进行实际的转换。内存中的现有数据仅被重新解释为新的数据类型。

我开始的数组是从文本文件读入的。数据类型不会提前知道，所以我无法在创建时设置dtype。我需要一个高性能和优雅的解决方案，它可以很好地适用于一般情况，因为我将为许多应用程序进行多次这种类型的转换。

谢谢！

来源

2010-09-01 Emma

如下您可以“创建阵列的（平面）列表中的记录阵列”使用numpy.core.records.fromarrays：

>>> import numpy as np 
>>> myarray = np.array([("Hello",2.5,3),("World",3.6,2)]) 
>>> print myarray 
[['Hello' '2.5' '3'] 
['World' '3.6' '2']] 


>>> newrecarray = np.core.records.fromarrays(myarray.transpose(), 
              names='col1, col2, col3', 
              formats = 'S8, f8, i8') 

>>> print newrecarray 
[('Hello', 2.5, 3) ('World', 3.5999999046325684, 2)]

我试图做同样的事情。我发现当numpy从现有的二维数组中创建一个结构化数组（使用np.core.records.fromarrays）时，它将二维数组中的每一列（而不是每一行）视为一条记录。所以你必须转置它。这种numpy的行为看起来并不直观，但也许有一个很好的理由。

来源

2011-03-05 14:08:47 Curious2learn

with'fromrecords'你可以避免''转置（）' – 2014-04-01 21:10:16

好吧，我一直在为此苦苦挣扎了一段时间，但是我找到了一种方法来做到这一点，不需要太多的努力。我很抱歉，如果这个代码是“脏” ......

让我们先从一个二维数组：

mydata = numpy.array([['text1', 1, 'longertext1', 0.1111], 
        ['text2', 2, 'longertext2', 0.2222], 
        ['text3', 3, 'longertext3', 0.3333], 
        ['text4', 4, 'longertext4', 0.4444], 
        ['text5', 5, 'longertext5', 0.5555]])

因此，我们结束了一个二维数组有4列和5行：

mydata.shape 
Out[30]: (5L, 4L)

使用numpy.core.records。阵列 - 我们需要提供输入参数作为阵列的列表，以便：

tuple(mydata) 
Out[31]: 
(array(['text1', '1', 'longertext1', '0.1111'], 
     dtype='|S11'), 
array(['text2', '2', 'longertext2', '0.2222'], 
     dtype='|S11'), 
array(['text3', '3', 'longertext3', '0.3333'], 
     dtype='|S11'), 
array(['text4', '4', 'longertext4', '0.4444'], 
     dtype='|S11'), 
array(['text5', '5', 'longertext5', '0.5555'], 
     dtype='|S11'))

这每个数据的行产生单独的阵列，但，我们所需要的输入数组是通过柱所以我们需要的是：

tuple(mydata.transpose()) 
Out[32]: 
(array(['text1', 'text2', 'text3', 'text4', 'text5'], 
     dtype='|S11'), 
array(['1', '2', '3', '4', '5'], 
     dtype='|S11'), 
array(['longertext1', 'longertext2', 'longertext3', 'longertext4', 
     'longertext5'], 
     dtype='|S11'), 
array(['0.1111', '0.2222', '0.3333', '0.4444', '0.5555'], 
     dtype='|S11'))

最后它需要阵列的列表，而不是一个元组，所以我们总结上面的列表中（）如下：

list(tuple(mydata.transpose()))

这就是我们的数据输入参数排序...接下来是dtype：

mydtype = numpy.dtype([('My short text Column', 'S5'), 
         ('My integer Column', numpy.int16), 
         ('My long text Column', 'S11'), 
         ('My float Column', numpy.float32)]) 
mydtype 
Out[37]: dtype([('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])

好了，现在我们可以传递到numpy.core.records.array（）：

myRecord = numpy.core.records.array(list(tuple(mydata.transpose())), dtype=mydtype)

...和手指交叉：

myRecord 
Out[36]: 
rec.array([('text1', 1, 'longertext1', 0.11110000312328339), 
     ('text2', 2, 'longertext2', 0.22220000624656677), 
     ('text3', 3, 'longertext3', 0.33329999446868896), 
     ('text4', 4, 'longertext4', 0.44440001249313354), 
     ('text5', 5, 'longertext5', 0.5554999709129333)], 
     dtype=[('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])

瞧！您可以按列名索引中：

myRecord['My float Column'] 
Out[39]: array([ 0.1111 , 0.22220001, 0.33329999, 0.44440001, 0.55549997], dtype=float32)

我希望这有助于为我浪费了太多的时间与numpy.asarray和mydata.astype等试图让这个最终制定出此方法之前工作。

来源

2013-03-01 13:50:21

我猜

new_array = np.core.records.fromrecords([("Hello",2.5,3),("World",3.6,2)], 
             names='Col1,Col2,Col3', 
             formats='S8,f8,i8')

是你想要的。

来源

2014-04-01 21:09:42

如果数据开始作为一个元组列表，然后创建一个结构数组是直截了当：

In [228]: alist = [("Hello",2.5,3),("World",3.6,2)] 
In [229]: dt = [("Col1","S8"),("Col2","f8"),("Col3","i8")] 
In [230]: np.array(alist, dtype=dt) 
Out[230]: 
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)], 
     dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

并发症这里是一个元组的列表已经变成一个二维字符串数组：

In [231]: arr = np.array(alist) 
In [232]: arr 
Out[232]: 
array([['Hello', '2.5', '3'], 
     ['World', '3.6', '2']], 
     dtype='<U5')

我们可以使用众所周知的zip*办法“换位”这阵 - 其实，我们希望有一个双转：

In [234]: list(zip(*arr.T)) 
Out[234]: [('Hello', '2.5', '3'), ('World', '3.6', '2')]

zip已经方便地给了我们一个元组列表。现在我们可以期望的D型细胞重新排列：

In [235]: np.array(_, dtype=dt) 
Out[235]: 
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)], 
     dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

接受的答案使用fromarrays：

In [236]: np.rec.fromarrays(arr.T, dtype=dt) 
Out[236]: 
rec.array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)], 
      dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

内部，fromarrays需要一个共同的recfunctions方法：创建目标磁盘阵列，通过字段名的值复制。实际上它有：

In [237]: newarr = np.empty(arr.shape[0], dtype=dt) 
In [238]: for n, v in zip(newarr.dtype.names, arr.T): 
    ...:  newarr[n] = v 
    ...:  
In [239]: newarr 
Out[239]: 
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)], 
     dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

来源

2017-06-02 21:56:58 hpaulj

将2D numpy阵列转换为结构化数组

回答

相关问题