2010-10-06 91 views
0

我有一个numpy的阵列时,楠值误差:NumPy的 - 为什么试图删除行

A = array([['id1', '1', '2', 'NaN'], 
      ['id2', '2', '0', 'NaN']]) 

我也有一个列表:

li = ['id1', 'id3', 'id6'] 

我想遍历数组和过列表以及数组每行中的第一个元素不在列表中的位置,然后从数组中删除整行。

我的代码至今:

from numpy import * 

for row in A: 
    if row[0] not in li: 
     delete(A, row, axis = 0) 

这将返回以下错误:

ValueError: invalid literal for int() with base 10: 'NaN' 

类型各行中的所有元素是STR(),所以我不明白这个提int()在错误中。

有什么建议吗?

感谢, 小号;-)

回答

5

就产生一个新的数组是没办法?

numpy.array([x for x in A if x[0] in li]) 
+0

是的,比我的解决方案简单得多! – eumiro 2010-10-06 14:19:13

+0

我认为原始的海报想要保留'row [0]'在'li'中的行,需要从列表理解中的条件中消除'not'。 – dtlussier 2010-10-06 15:29:33

+0

@dtlussier:谢谢你指出我的错误。 :) – atomocopter 2010-10-06 21:24:01

2

看样子你要删除阵列就地,然而,这是不可能使用np.delete功能,作为这样的操作违背了Python和NumPy的管理方式的一排记忆。

我发现numpy的邮件列表上一个有趣的帖子(Travis Oliphant, [Numpy-discussion] Deleting a row from a matrix)先被讨论的np.delete功能:

So, "in-place" deletion of array objects would not be particularly useful, because it would only work for arrays with no additional reference counts (i.e. simple b=a assignment would increase the reference count and make it impossible to say del a[obj]).

....

But, the problem with both of those approaches is that once you start removing arbitrary rows (or n-1 dimensional sub-spaces) from an array you very likely will no longer have a chunk of memory that can be described using the n-dimensional array memory model.

如果你看一看的np.deletehttp://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html)的文件中,我们可以看到,该函数返回一个新的数组,其中删除了所需的部分(不一定是行)。

Definition:  np.delete(arr, obj, axis=None) 
Docstring: 
Return a new array with sub-arrays along an axis deleted. 

Parameters 
---------- 
arr : array_like 
    Input array. 
obj : slice, int or array of ints 
    Indicate which sub-arrays to remove. 
axis : int, optional 
    The axis along which to delete the subarray defined by `obj`. 
    If `axis` is None, `obj` is applied to the flattened array. 

Returns 
------- 
out : ndarray 
    A copy of `arr` with the elements specified by `obj` removed. Note 
    that `delete` does not occur in-place. If `axis` is None, `out` is 
    a flattened array. 

所以,你的情况我想你会想要做的事,如:

A = array([['id1', '1', '2', 'NaN'], 
      ['id2', '2', '0', 'NaN']]) 

li = ['id1', 'id3', 'id6'] 

for i, row in enumerate(A): 
    if row[0] not in li: 
     A = np.delete(A, i, axis=0) 

A现在是削减下来,你想要的,但要记住这是一个新的内存块。每次调用np.delete被称为新内存分配名称A将指向。

我敢肯定,有一个更好的矢量化的方式(也许使用屏蔽数组?)找出要删除的行,但我不能把它们放在一起。如果有人有,但请评论!

相关问题