处理大小约(大约)30 000的稀疏向量的最佳方式是什么?除了一个索引值为1(1-HOT向量)之外,所有索引都为零?如何在Tensorflow中处理非常稀疏的向量
在我的数据集中,我有一个值序列,我将其转换为每个值的一个1-HOT向量。这是我目前做的事情:
# Create some queues to read data from .csv files
...
# Parse example(/line) from the data file
example = tf.decode_csv(value, record_defaults=record_defaults)
# example now looks like (e.g) [[5], [1], [4], [38], [571], [9]]
# [5] indicates the length of the sequence
# 1, 4, 38, 571 is the input sequence
# 4, 38, 571, 9 is the target sequence
# Create 1-HOT vectors for each value in the sequence
sequence_length = example[0]
one_hots = example[1:]
one_hots = tf.reshape(one_hots, [-1])
one_hots = tf.one_hot(one_hots, depth=n_classes)
# Grab the first values as the input features and the last values as target
features = one_hots[:-1]
targets = one_hots[1:]
...
# The sequence_length, features and targets are added to a list
# and the list is sent into a batch with tf.train_batch_join(...).
# So now I can get batches and feed into my RNN
...
这是有效的,但我相信它可以以更有效的方式完成。我看着SparseTensor,但我无法弄清楚如何从example
张量中创建SparseTensors,我从tf.decode_csv
得到张量。而且我读到somwhere,最好是在批量检索数据后解析数据,这是否仍然正确?
Here是完整代码的pastebin。第32行是我创建1-HOT向量的最新方式。
或者:我怎样才能创建1-HOT载体在**之后的例子** **我拉了一批例子(如果我只是将例子直接传入批处理) –