2016-07-06 93 views
2

可以说我有一个重复索引的稀疏张量,它们是重复的我想合并值(总结起来) 这样做的最佳方式是什么?合并稀疏张量中的重复索引

例如:

indicies = [[1, 1], [1, 2], [1, 2], [1, 3]] 
values = [1, 2, 3, 4] 

object = tf.SparseTensor(indicies, values, shape=[10, 10]) 

result = tf.MAGIC(object) 

结果应符合下列值备用张量(或混凝土!):

indicies = [[1, 1], [1, 2], [1, 3]] 
values = [1, 5, 4] 

我虽然的唯一一件事就是字符串连接的indicies在一起创建一个索引散列将其应用于第三维,然后减少该第三维上的总和。

indicies = [[1, 1, 11], [1, 2, 12], [1, 2, 12], [1, 3, 13]] 
sparse_result = tf.sparse_reduce_sum(sparseTensor, reduction_axes=2, keep_dims=true) 

但是,这感觉非常非常难看

回答

3

下面是使用tf.segment_sum的解决方案。这个想法是将指数线性化到1-D空间,获得唯一索引tf.unique,运行tf.segment_sum,并将索引转换回N-D空间。

indices = tf.constant([[1, 1], [1, 2], [1, 2], [1, 3]]) 
values = tf.constant([1, 2, 3, 4]) 

# Linearize the indices. If the dimensions of original array are 
# [N_{k}, N_{k-1}, ... N_0], then simply matrix multiply the indices 
# by [..., N_1 * N_0, N_0, 1]^T. For example, if the sparse tensor 
# has dimensions [10, 6, 4, 5], then multiply by [120, 20, 5, 1]^T 
# In your case, the dimensions are [10, 10], so multiply by [10, 1]^T 

linearized = tf.matmul(indices, [[10], [1]]) 

# Get the unique indices, and their positions in the array 
y, idx = tf.unique(tf.squeeze(linearized)) 

# Use the positions of the unique values as the segment ids to 
# get the unique values 
values = tf.segment_sum(values, idx) 

# Go back to N-D indices 
y = tf.expand_dims(y, 1) 
indices = tf.concat([y//10, y%10], axis=1) 

tf.InteractiveSession() 
print(indices.eval()) 
print(values.eval()) 
+0

这比我想象的要漂亮多了 – dtracers