2017-06-22 153 views
1

我正在训练tensorflow.contrib.seq2seq编码器 - 解码器模型,每个小批次的训练时间单调递增。每个小批次的TensorFlow Seq2Seq训练时间单调递增

Step Number: 10 Elapsed time: 52.89215302467346 Loss: 1.0420862436294556 Metrics: {'accuracy': 0.22499999} Step Number: 20 Elapsed time: 60.28505992889404 Loss: 0.8007364869117737 Metrics: {'accuracy': 0.28} Step Number: 30 Elapsed time: 73.98479580879211 Loss: 0.7292348742485046 Metrics: {'accuracy': 0.34} Step Number: 40 Elapsed time: 82.99069213867188 Loss: 0.6843382120132446 Metrics: {'accuracy': 0.345} Step Number: 50 Elapsed time: 86.97363901138306 Loss: 0.6808319687843323 Metrics: {'accuracy': 0.38999999} Step Number: 60 Elapsed time: 106.96697807312012 Loss: 0.601255476474762 Metrics: {'accuracy': 0.44} Step Number: 70 Elapsed time: 124.17725801467896 Loss: 0.5971778035163879 Metrics: {'accuracy': 0.405} Step Number: 80 Elapsed time: 137.91252613067627 Loss: 0.596596896648407 Metrics: {'accuracy': 0.43000001} Step Number: 90 Elapsed time: 146.6834409236908 Loss: 0.5921837687492371 Metrics: {'accuracy': 0.42500001}

我所有的数据被人为地产生和随机抽样,这意味着(一般)应该在后面的训练在训练初期minibatches和minibatches之间没有什么区别。另外,我所有的数据都有相同的输入序列长度和相同的输出序列长度。为什么我的模型需要更长的时间来训练稍后的迷你贴纸?

我发现这个相关post,但我没有改变我的训练循环中的计算图。

表现出一定的代码,让我们在main开始:

def main(_): 
    x_minibatch, y_minibatch, y_lengths_minibatch = construct_data_pipeline() 

    model = import_model() 

    train(model=model, x_minibatch=x_minibatch, y_minibatch=y_minibatch, y_lengths_minibatch=y_lengths_minibatch) 

```

我的数据存储为SequenceExample S,每TFRecord文件之一。我construct_data_pipeline()函数的定义如下:

def construct_data_pipeline(): 
    # extract TFRecord filenames located in data directory 
    tfrecord_filenames = [] 
    for dirpath, dirnames, filenames in os.walk(tf.app.flags.FLAGS.data_dir): 
     for filename in filenames: 
      if filename.endswith('.tfrecord'): 
       tfrecord_filenames.append(os.path.join(dirpath, filename)) 

    # read and parse data from TFRecords into tensors 
    x, y, x_len, y_len = construct_examples_queue(tfrecord_filenames) 

    # group tensors into minibatches 
    x_minibatch, y_minibatch, y_lengths_minibatch = construct_minibatches(x=x, y=y, 
                     y_len=y_len, 
                     x_len=x_len) 

    return x_minibatch, y_minibatch, y_lengths_minibatch 

步入construct_examples_queue()

def construct_examples_queue(tfrecords_filenames): 
    number_of_readers = tf.flags.FLAGS.number_of_readers 

    with tf.name_scope('examples_queue'): 
     key, example_serialized = tf.contrib.slim.parallel_reader.parallel_read(tfrecords_filenames, 
                      tf.TFRecordReader, 
                      num_readers=number_of_readers) 

     x, y, x_len, y_len = parse_example(example_serialized) 

     return x, y, x_len, y_len 

我不认为我可以告诉parse_example,因为数据是不是我自己的。主要的部分是我指定我所期望的SequenceExample遏制,然后调用

context_parsed, sequence_parsed = tf.parse_single_sequence_example(example_serialized, 
                    context_features=context_features, 
                    sequence_features=sequence_features) 

直接跳到我是如何构建minibatches,我用

def construct_minibatches(x, y, y_len, x_len, 
         bucket_boundaries=list(range(400, tf.app.flags.FLAGS.max_x_len, 100))): 

    batch_size = tf.app.flags.FLAGS.batch_size 

    with tf.name_scope('batch_examples_using_buckets'): 
     _, outputs = tf.contrib.training.bucket_by_sequence_length(input_length=len_x, 
                   tensors=[x, y, y_len], 
                   batch_size=batch_size, 
                   bucket_boundaries=bucket_boundaries, 
                   dynamic_pad=True, 
                   capacity=2 * batch_size, 
                   allow_smaller_final_batch=True) 

     x_minibatch = outputs[0] 
     y_minibatch = outputs[1] 
     y_lengths_minibatch = outputs[2] 
     return x_minibatch, y_minibatch, y_lengths_minibatch 

注:我不得不改变一些变量名为隐私问题。希望我没有犯任何错误。

+1

愚蠢的问题,但你确定自培训开始以来没有经过时间?什么会产生“经过时间”? – vega

+0

损失也在稳步下降。 – NRitH

+0

已用时间初始化为'start_time = time.time()'。然后,在10个minibatches上训练后,我调用'print(time.time() - start_time)',然后调用'start_time = time.time()'。 –

回答

1

贷款faddy-w同时解决我的两个问题!

事实证明,我改变我的计算图而不知道它。

我打电话

sess.run([model.optimizer.minimize(model.loss), model.y_predicted_logits], 
           feed_dict={model.x: x_values, 
              model.y_actual: y_values, 
              model.y_actual_lengths: y_lengths_values}) 

从一个循环内,其中

model.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=self.y_actual, 
                     logits=self.y_predicted_logits)) 

model.optimizer = tf.train.GradientDescentOptimizer(learning_rate=initial_learning_rate) 

不知道optimizer.minimize()增加了额外的操作,以我的图表。