我正在训练tensorflow.contrib.seq2seq
编码器 - 解码器模型,每个小批次的训练时间单调递增。每个小批次的TensorFlow Seq2Seq训练时间单调递增
Step Number: 10 Elapsed time: 52.89215302467346 Loss: 1.0420862436294556 Metrics: {'accuracy': 0.22499999} Step Number: 20 Elapsed time: 60.28505992889404 Loss: 0.8007364869117737 Metrics: {'accuracy': 0.28} Step Number: 30 Elapsed time: 73.98479580879211 Loss: 0.7292348742485046 Metrics: {'accuracy': 0.34} Step Number: 40 Elapsed time: 82.99069213867188 Loss: 0.6843382120132446 Metrics: {'accuracy': 0.345} Step Number: 50 Elapsed time: 86.97363901138306 Loss: 0.6808319687843323 Metrics: {'accuracy': 0.38999999} Step Number: 60 Elapsed time: 106.96697807312012 Loss: 0.601255476474762 Metrics: {'accuracy': 0.44} Step Number: 70 Elapsed time: 124.17725801467896 Loss: 0.5971778035163879 Metrics: {'accuracy': 0.405} Step Number: 80 Elapsed time: 137.91252613067627 Loss: 0.596596896648407 Metrics: {'accuracy': 0.43000001} Step Number: 90 Elapsed time: 146.6834409236908 Loss: 0.5921837687492371 Metrics: {'accuracy': 0.42500001}
我所有的数据被人为地产生和随机抽样,这意味着(一般)应该在后面的训练在训练初期minibatches和minibatches之间没有什么区别。另外,我所有的数据都有相同的输入序列长度和相同的输出序列长度。为什么我的模型需要更长的时间来训练稍后的迷你贴纸?
我发现这个相关post,但我没有改变我的训练循环中的计算图。
表现出一定的代码,让我们在main
开始:
def main(_):
x_minibatch, y_minibatch, y_lengths_minibatch = construct_data_pipeline()
model = import_model()
train(model=model, x_minibatch=x_minibatch, y_minibatch=y_minibatch, y_lengths_minibatch=y_lengths_minibatch)
```
我的数据存储为SequenceExample
S,每TFRecord
文件之一。我construct_data_pipeline()
函数的定义如下:
def construct_data_pipeline():
# extract TFRecord filenames located in data directory
tfrecord_filenames = []
for dirpath, dirnames, filenames in os.walk(tf.app.flags.FLAGS.data_dir):
for filename in filenames:
if filename.endswith('.tfrecord'):
tfrecord_filenames.append(os.path.join(dirpath, filename))
# read and parse data from TFRecords into tensors
x, y, x_len, y_len = construct_examples_queue(tfrecord_filenames)
# group tensors into minibatches
x_minibatch, y_minibatch, y_lengths_minibatch = construct_minibatches(x=x, y=y,
y_len=y_len,
x_len=x_len)
return x_minibatch, y_minibatch, y_lengths_minibatch
步入construct_examples_queue()
def construct_examples_queue(tfrecords_filenames):
number_of_readers = tf.flags.FLAGS.number_of_readers
with tf.name_scope('examples_queue'):
key, example_serialized = tf.contrib.slim.parallel_reader.parallel_read(tfrecords_filenames,
tf.TFRecordReader,
num_readers=number_of_readers)
x, y, x_len, y_len = parse_example(example_serialized)
return x, y, x_len, y_len
我不认为我可以告诉parse_example
,因为数据是不是我自己的。主要的部分是我指定我所期望的SequenceExample
遏制,然后调用
context_parsed, sequence_parsed = tf.parse_single_sequence_example(example_serialized,
context_features=context_features,
sequence_features=sequence_features)
直接跳到我是如何构建minibatches,我用
def construct_minibatches(x, y, y_len, x_len,
bucket_boundaries=list(range(400, tf.app.flags.FLAGS.max_x_len, 100))):
batch_size = tf.app.flags.FLAGS.batch_size
with tf.name_scope('batch_examples_using_buckets'):
_, outputs = tf.contrib.training.bucket_by_sequence_length(input_length=len_x,
tensors=[x, y, y_len],
batch_size=batch_size,
bucket_boundaries=bucket_boundaries,
dynamic_pad=True,
capacity=2 * batch_size,
allow_smaller_final_batch=True)
x_minibatch = outputs[0]
y_minibatch = outputs[1]
y_lengths_minibatch = outputs[2]
return x_minibatch, y_minibatch, y_lengths_minibatch
注:我不得不改变一些变量名为隐私问题。希望我没有犯任何错误。
愚蠢的问题,但你确定自培训开始以来没有经过时间?什么会产生“经过时间”? – vega
损失也在稳步下降。 – NRitH
已用时间初始化为'start_time = time.time()'。然后,在10个minibatches上训练后,我调用'print(time.time() - start_time)',然后调用'start_time = time.time()'。 –