对于Tensorflow训练的LSTM模式,我已经结构化我的数据为tf.train.SequenceExample格式,并将其存储到TFRecord文件。我现在想使用新的DataSet API来生成生成填充批次用于培训。在the documentation有一个使用padded_batch的例子,但对于我的数据我无法弄清楚padded_shapes应该是什么值。如何使用DataSet API在Tensorflow中为tf.train.SequenceExample数据创建填充批次?
对于读TFrecord文件到我写了下面的Python代码批次:
import math
import tensorflow as tf
import numpy as np
import struct
import sys
import array
if(len(sys.argv) != 2):
print "Usage: createbatches.py [RFRecord file]"
sys.exit(0)
vectorSize = 40
inFile = sys.argv[1]
def parse_function_dataset(example_proto):
sequence_features = {
'inputs': tf.FixedLenSequenceFeature(shape=[vectorSize],
dtype=tf.float32),
'labels': tf.FixedLenSequenceFeature(shape=[],
dtype=tf.int64)}
_, sequence = tf.parse_single_sequence_example(example_proto, sequence_features=sequence_features)
length = tf.shape(sequence['inputs'])[0]
return sequence['inputs'], sequence['labels']
sess = tf.InteractiveSession()
filenames = tf.placeholder(tf.string, shape=[None])
dataset = tf.contrib.data.TFRecordDataset(filenames)
dataset = dataset.map(parse_function_dataset)
# dataset = dataset.batch(1)
dataset = dataset.padded_batch(4, padded_shapes=[None])
iterator = dataset.make_initializable_iterator()
batch = iterator.get_next()
# Initialize `iterator` with training data.
training_filenames = [inFile]
sess.run(iterator.initializer, feed_dict={filenames: training_filenames})
print(sess.run(batch))
代码工作得很好,如果我使用dataset = dataset.batch(1)
(在这种情况下,不需要填充),但是当我使用padded_batch
变种,我得到以下错误:
TypeError: If shallow structure is a sequence, input must also be a sequence. Input has type: .
你能帮我搞清楚什么我应该通过对padded_shapes参数?
(我知道有很多的例子代码中使用线程和队列对于这一点,但我宁愿使用新的DataSet API为这个项目)
谢谢Marijn!你的问题帮了我很多! –