2017-04-26 136 views
0

我想在执行故事发生器时使用手套向量作为单词表示。我在输出中使用了一个完全连接图层的2层LSTM for softmax。如何在多层LSTM之前添加嵌入层?

的archetecture看起来是这样的:

Input --> LSTM --> LSTM --> Fully connected --> Output 

对于我的输入,模型应该采取三个词并输出基于这三个字的词。每个输入都是维度为25的矢量。在我用于训练的文本中,只有100个标签。每个LSTM有512个隐藏单元。

请参阅下面我的代码:

# Parameters 
learning_rate = 0.001 
training_iters = 50000 
display_step = 1000 
n_input = 3 
n_hidden = 512 

# tf Graph input 
x = tf.placeholder("float", [None, n_input, glove_dim]) 
y = tf.placeholder("float", [None, vocab_size]) 

# RNN output node weights and biases 
weights = {'out': tf.Variable(tf.random_normal([n_hidden, vocab_size]))} 
biases = {'out': tf.Variable(tf.random_normal([vocab_size]))} 

def RNN(x, weights, biases): 

    # reshape to [1, n_input] 
    x = tf.reshape(x, [-1, n_input]) 

    # Generate a n_input-element sequence of inputs 
    x = tf.split(x,n_input,1) 


    rnn_cell =rnn.MultiRNNCell([rnn.BasicLSTMCell(n_hidden),rnn.BasicLSTMCell(n_hidden)]) 

    # generate prediction 
    outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32) 

    return tf.matmul(outputs[-1], weights['out']) + biases['out'] 

pred = RNN(x, weights, biases) 

# Loss and optimizer 
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y)) 
optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate).minimize(cost) 

# Model evaluation 
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1)) 
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) 

# Initializing the variables 
init = tf.global_variables_initializer() 

# Launch the graph 
with tf.Session() as session: 
    session.run(init) 
    step = 0 
    offset = random.randint(0,n_input+1) 
    end_offset = n_input + 1 
    acc_total = 0 
    loss_total = 0 

    writer.add_graph(session.graph) 

    while step < training_iters: 
     # Generate a minibatch. Add some randomness on selection process. 
     if offset > (len(training_data)-end_offset): 
      offset = random.randint(0, n_input+1) 

     symbols_in_keys = [ [glove_dictionary[ str(training_data[i])]] for i in range(offset, offset+n_input) ] 
     symbols_in_keys = np.reshape(np.array(symbols_in_keys), [-1, n_input, glove_dim]) 

     symbols_out_onehot = np.zeros([vocab_size], dtype=float) 
     symbols_out_onehot[dictionary[str(training_data[offset+n_input])]] = 1.0 
     symbols_out_onehot = np.reshape(symbols_out_onehot,[1,-1]) 

     _, acc, loss, onehot_pred = session.run([optimizer, accuracy, cost, pred], \ 
              feed_dict={x:symbols_in_keys, y: symbols_out_onehot}) 
     loss_total += loss 
     acc_total += acc 
     if (step+1) % display_step == 0: 
      print("Iter= " + str(step+1) + ", Average Loss= " + \ 
        "{:.6f}".format(loss_total/display_step) + ", Average Accuracy= " + \ 
        "{:.2f}%".format(100*acc_total/display_step)) 
      acc_total = 0 
      loss_total = 0 
      symbols_in = [training_data[i] for i in range(offset, offset + n_input)] 
      symbols_out = training_data[offset + n_input] 
      symbols_out_pred = reverse_dictionary[int(tf.argmax(onehot_pred, 1).eval())] 
      print("%s - [%s] vs [%s]" % (symbols_in,symbols_out,symbols_out_pred)) 
    step += 1 
    offset += (n_input+1) 
    print("Optimization Finished!") 
    print("Elapsed time: ", elapsed(time.time() - start_time)) 
    print("Run on command line.") 
    print("\ttensorboard --logdir=%s" % (logs_path)) 
    print("Point your web browser to: http://localhost:6006/") 
    while True: 
     prompt = "%s words: " % n_input 
     sentence = input(prompt) 
     sentence = sentence.strip() 
     words = sentence.split(' ') 
     if len(words) != n_input: 
      continue 
     try: 
      symbols_in_keys = [glove_dictionary[str(words[i])] for i in range(len(words))] 
      for i in range(32): 
       keys = np.reshape(np.array(symbols_in_keys), [-1, n_input, 1]) 
       onehot_pred = session.run(pred, feed_dict={x: keys}) 
       onehot_pred_index = int(tf.argmax(onehot_pred, 1).eval()) 
       sentence = "%s %s" % (sentence,reverse_dictionary[onehot_pred_index]) 
       symbols_in_keys = symbols_in_keys[1:] 
       symbols_in_keys.append(onehot_pred_index) 
      print(sentence) 
     except: 
      print("Word not in dictionary") 

当我运行它,我得到的错误:

InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [160,14313] and labels shape [10] 
    [[Node: SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](add, Reshape_1)]] 

我能知道如何logits形状被确定,我能做些什么纠正我的代码?

回答

0

我认为问题来自当你做

# reshape to [1, n_input] 
x = tf.reshape(x, [-1, n_input]) 

# Generate a n_input-element sequence of inputs 
x = tf.split(x,n_input,1) 

x首先重塑到(batch_size * glove_dim, n_input)

再拆到(batch_size * glove_dim, 1)

因此rnn.static_rnn采取1input_size和你的项目它通过乘以weight矩阵到vocab_size

这将导致输出为(batch_size * glove_dim, vocab_size)

也许你可以尝试添加x = [tf.reshape(w, [-1, glove_dim]) for w in x]

x = tf.split(x,n_input,1)

+0

这实际工作后!我无法理解现在做重塑时的形状是什么? – noobiejp

+0

我不太确定你在问什么。你的意思是-1? – DAlolicorn

+0

我的意思是在做'x = [tf.reshape(w,[-1,glove_dim])for w in x''后x的形状? – noobiejp