2017-06-26 69 views
2

我很好奇,是否有一种很好的方法来分享不同RNN小区的权重,同时仍然为每个小区提供不同的输入。如何在Tensorflow中的不同输入中分享跨不同RNN单元的权重?

,我想建立的图表是这样的:

enter image description here

那里有橙色3个LSTM细胞,其并行操作,我想和大家分享的权重之间。

我已经设法实现了类似于我想要使用占位符的东西(请参阅下面的代码)。但是,使用占位符会中断优化程序的渐变计算,并且不会训练超过使用占位符的点的任何内容。在Tensorflow中可以做到这一点吗?

我使用Tensorflow 1.2和3.5蟒在蟒蛇环境在Windows 7

代码:

def ann_model(cls,data, act=tf.nn.relu): 
    with tf.name_scope('ANN'): 
     with tf.name_scope('ann_weights'): 
      ann_weights = tf.Variable(tf.random_normal([1, 
                 cls.n_ann_nodes])) 
     with tf.name_scope('ann_bias'): 
      ann_biases = tf.Variable(tf.random_normal([1])) 
     out = act(tf.matmul(data,ann_weights) + ann_biases) 
    return out 

def rnn_lower_model(cls,data): 
    with tf.name_scope('RNN_Model'): 
     data_tens = tf.split(data, cls.sequence_length,1) 
     for i in range(len(data_tens)): 
      data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size, 
                cls.n_rnn_inputs]) 

     rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower) 

     outputs, states = tf.contrib.rnn.static_rnn(rnn_cell, 
                data_tens, 
                dtype=tf.float32) 

     with tf.name_scope('RNN_out_weights'): 
      out_weights = tf.Variable(
        tf.random_normal([cls.n_rnn_nodes_lower,1])) 
     with tf.name_scope('RNN_out_biases'): 
      out_biases = tf.Variable(tf.random_normal([1])) 

     #Encode the output of the RNN into one estimate per entry in 
     #the input sequence 
     predict_list = [] 
     for i in range(cls.sequence_length): 
      predict_list.append(tf.matmul(outputs[i], 
              out_weights) 
              + out_biases) 
    return predict_list 

def create_graph(cls,sess): 
    #Initializes the graph 
    with tf.name_scope('input'): 
     cls.x = tf.placeholder('float',[cls.batch_size, 
             cls.sequence_length, 
             cls.n_inputs]) 
    with tf.name_scope('labels'): 
     cls.y = tf.placeholder('float',[cls.batch_size,1]) 
    with tf.name_scope('community_id'): 
     cls.c = tf.placeholder('float',[cls.batch_size,1]) 

    #Define Placeholder to provide variable input into the 
    #RNNs with shared weights  
    cls.input_place = tf.placeholder('float',[cls.batch_size, 
               cls.sequence_length, 
               cls.n_rnn_inputs]) 

    #global step used in optimizer 
    global_step = tf.Variable(0,trainable = False) 

    #Create ANN 
    ann_output = cls.ann_model(cls.c) 
    #Combine output of ANN with other input data x 
    ann_out_seq = tf.reshape(tf.concat([ann_output for _ in 
              range(cls.sequence_length)],1), 
          [cls.batch_size, 
          cls.sequence_length, 
          cls.n_ann_nodes]) 
    cls.rnn_input = tf.concat([ann_out_seq,cls.x],2) 

    #Create 'unrolled' RNN by creating sequence_length many RNN Cells that 
    #share the same weights. 
    with tf.variable_scope('Lower_RNNs'): 
     #Create RNNs 
     daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2 

当训练迷你批次分两个步骤计算:

RNNinput = sess.run(cls.rnn_input,feed_dict = { 
              cls.x:batch_x, 
              cls.y:batch_y, 
              cls.c:batch_c}) 
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput, 
             cls.y:batch_y, 
             cls.x:batch_x, 
             cls.c:batch_c}) 

感谢您的帮助。任何想法,将不胜感激。

+0

为什么你有两个feed_dict? –

+0

第二个与第一个相同,但包含了第一个'sess.run'的结果'RNNinput'。这就是我将具有共享RNN小区的较低层的输出传递给上层的方式。我使用占位符'cls.input_place'在第二个'sess.run'调用中执行此操作。不幸的是,这破坏了tensorflow的反向传播计算。 – AlexR

+0

你不应该那样做。您可以像链接中提到的那样构建一个图形,提供一次输入并让整个网络训练。任何理由,为什么你无法做到这一点? –

回答

1

您有3个不同的输入:input_1, input_2, input_3将其馈送到具有共享参数的LSTM模型。然后你连接3 lstm的输出并将它传递给最终的LSTM层。代码应该是这样的:

# Create input placeholder for the network 
input_1 = tf.placeholder(...) 
input_2 = tf.placeholder(...) 
input_3 = tf.placeholder(...) 

# create a shared rnn layer 
def shared_rnn(...): 
    ... 
    rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...) 

# generate the outputs for each input 
with tf.variable_scope('lower_lstm') as scope: 
    out_input_1 = shared_rnn(...) 
    scope.reuse_variables() # the variables will be reused. 
    out_input_2 = shared_rnn(...) 
    scope.reuse_variables() 
    out_input_3 = shared_rnn(...) 

# verify whether the variables are reused 
for v in tf.global_variables(): 
    print(v.name) 

# concat the three outputs 
output = tf.concat... 

# Pass it to the final_lstm layer and out the logits 
logits = final_layer(output, ...) 

train_op = ... 

# train 
    sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...} 
+0

谢谢。这更像我想做的事情。 – AlexR

0

我最终重新思考了一下我的架构,想出了一个更可行的解决方案。

我没有复制LSTM细胞的中间层来创建三个具有相同权重的不同单元格,而是选择了三次运行同一个单元格。每次运行的结果都存储在类似于tf.Variable的“缓冲区”中,然后将整个变量用作最终LSTM层的输入。 I drew a diagram here

实现这种方式允许有效的输出后3个时间步骤,并没有打破tensorflows传播算法(即人工神经网络中的节点仍然可以进行训练。)

唯一棘手的事情是使确保缓冲区按照最终RNN的顺序排列。

相关问题