3D NumPy数组中每个子阵列或切片的频率计数

我试图获得numpy 3d数组中每个子阵列的频率计数（无零）。但是，scipy.stats.itemfreq工具会返回2d数组中的频率计数。3D NumPy数组中每个子阵列或切片的频率计数

我得到的是：

array_3d= array([[[1, 0, 0], 
    [1, 0, 0], 
    [0, 2, 0]], 

    [[0, 0, 0], 
    [0, 0, 3], 
    [3, 3, 3]], 

    [[0, 0, 4], 
    [0, 0, 4], 
    [0, 0, 4]]]) 

>>> itemfreq(array_3d)[1:,] 
# outputs 
array([ 1, 2], 
    [ 2, 1], 
    [ 3, 4], 
    [ 4, 3]], dtype=int64)

虽然我想输出：

array([[ 1, 2, 2, 1], 
    [ 3, 4], 
    [ 4, 3]], dtype=object)

的想法是，奇数总是独特价值和偶数的频率。

另一个输出可以是：

array([ 1, 2, 0], 
    [ 2, 1, 0], 
    [ 3, 4, 1], 
    [ 4, 3, 2]], dtype=int64)

其中第三列表示3D阵列的子集数量。

我也对其他输出/解决方案开放！

提前致谢！

来源

2016-04-29 Wilmar van Ommeren

你能描述你试图解决的更高层次的问题吗？ –

numpy_indexed包（声明：我其作者）可以用来在一个优雅和矢量方式来解决这个问题：

import numpy_indexed as npi 
index = np.arange(array_3d.size) // array_3d[0].size 
(value, index), count = npi.count((array_3d.flatten(), index))

然后，这给出了：

index = [0 0 0 1 1 2 2] 
value = [0 1 2 0 3 0 4] 
count = [6 2 1 5 4 6 3]

其可通过用值索引>进行后处理0如果需要的话

来源

2016-04-29 10:45:14

我认为OP正在寻找每个子阵列或切片的计数。因此，“扁平化”会与期望的输出相矛盾。 – Divakar

这是什么样的例子;什么是计数'array_3d.flatten（）'和'索引' –

啊我看到的独特组合！尼斯。怎么样避免零？避免输出的第一个元素？ – Divakar

方法＃1

下面是使用NumPy broadcasting一个量化的方法 -

# Get unique non-zero elements 
unq = np.unique(array_3d[array_3d!=0]) 

# Get matches mask corresponding to all array_3d elements against all unq elements 
mask = array_3d == unq[:,None,None,None] 

# Get the counts 
sums = mask.sum(axis=(2,3)).T 

# Indices of non-zero(valid) counts 
Rvalid,Cvalid = np.where(sums!=0) 

# Finally, pressent the output in the desired format 
out = np.column_stack((unq[Cvalid],sums[sums!=0],Rvalid))

请注意，这将是一个缺乏天然资源的方法。

方法2

这里的另一种方法就是少耗资源，因此可能是首选 -

a2d = np.sort(array_3d.reshape(array_3d.shape[0],-1),axis=1) 
start_mask = np.column_stack((a2d[:,0] !=0,np.diff(a2d,axis=1)>0)) 

unqID = a2d + ((np.arange(a2d.shape[0])*a2d.max())[:,None]) 
count = np.unique(unqID[a2d!=0],return_counts=True)[1] 
out = np.column_stack((a2d[start_mask],count,np.where(start_mask)[0]))

请注意：count可以用np.bincount可替代地计算可能会更快，像这样 -

C = np.bincount(unqID[a2d!=0]) 
count = C[C!=0]

来源

2016-04-29 10:38:57 Divakar

感谢您的评论。这个方法工作完美，array_3d相对较小。不过，我的array_3d有89528个子阵列。最有可能的是，这是掩码不工作的原因，因为掩码数组变得很大。 –

@WilmarvanOmmeren是的，这肯定会是资源饥饿的方法。 – Divakar

@WilmarvanOmmeren你可以看看刚添加的第二种方法吗？谢谢！ – Divakar

大熊猫这一结果给予直观的方式太：

df = pd.DataFrame(array_3d.reshape(3,9)) 
stats = df.apply(lambda x : unique(x,return_counts=True),axis=1) 
result = stats.apply(lambda x : vstack(x)[:,1:].ravel())

对于

#stats 
0 ([0, 1, 2], [6, 2, 1]) 
1   ([0, 3], [5, 4]) 
2   ([0, 4], [6, 3]) 

#result 
0 [1, 2, 2, 1] 
1   [3, 4] 
2   [4, 3]

来源

2016-04-29 11:53:30

3D NumPy数组中每个子阵列或切片的频率计数

回答

相关问题