在张量流中实现注意束搜索

我已经写了我自己的代码，参考this奇妙的教程，并且当我按照我在课堂上的理解使用注意力搜索时，我无法得到结果AttentionModel _build_decoder_cell函数创建单独的解码器细胞和推理模式注意包装，假设这（我认为这是不正确的，不能找到办法解决它），在张量流中实现注意束搜索

with tf.name_scope("Decoder"): 

mem_units = 2*dim 
dec_cell = tf.contrib.rnn.BasicLSTMCell(2*dim) 
beam_cel = tf.contrib.rnn.BasicLSTMCell(2*dim) 
beam_width = 3 
out_layer = Dense(output_vocab_size) 

with tf.name_scope("Training"): 
    attn_mech = tf.contrib.seq2seq.BahdanauAttention(num_units = mem_units, memory = enc_rnn_out, normalize=True) 
    attn_cell = tf.contrib.seq2seq.AttentionWrapper(cell = dec_cell,attention_mechanism = attn_mech) 

    batch_size = tf.shape(enc_rnn_out)[0] 
    initial_state = attn_cell.zero_state(batch_size = batch_size , dtype=tf.float32) 
    initial_state = initial_state.clone(cell_state = enc_rnn_state) 

    helper = tf.contrib.seq2seq.TrainingHelper(inputs = emb_x_y , sequence_length = seq_len) 
    decoder = tf.contrib.seq2seq.BasicDecoder(cell = attn_cell, helper = helper, initial_state = initial_state ,output_layer=out_layer) 
    outputs, final_state, final_sequence_lengths= tf.contrib.seq2seq.dynamic_decode(decoder=decoder,impute_finished=True) 

    training_logits = tf.identity(outputs.rnn_output) 
    training_pred = tf.identity(outputs.sample_id) 

with tf.name_scope("Inference"): 

    enc_rnn_out_beam = tf.contrib.seq2seq.tile_batch(enc_rnn_out , beam_width) 
    seq_len_beam  = tf.contrib.seq2seq.tile_batch(seq_len  , beam_width) 
    enc_rnn_state_beam = tf.contrib.seq2seq.tile_batch(enc_rnn_state , beam_width) 

    batch_size_beam  = tf.shape(enc_rnn_out_beam)[0] # now batch size is beam_width times 

    # start tokens mean be the original batch size so divide 
    start_tokens = tf.tile(tf.constant([27], dtype=tf.int32), [ batch_size_beam//beam_width ]) 
    end_token = 0 

    attn_mech_beam = tf.contrib.seq2seq.BahdanauAttention(num_units = mem_units, memory = enc_rnn_out_beam, normalize=True) 
    cell_beam = tf.contrib.seq2seq.AttentionWrapper(cell=beam_cel,attention_mechanism=attn_mech_beam,attention_layer_size=mem_units) 

    initial_state_beam = cell_beam.zero_state(batch_size=batch_size_beam,dtype=tf.float32).clone(cell_state=enc_rnn_state_beam) 

    my_decoder = tf.contrib.seq2seq.BeamSearchDecoder(cell = cell_beam, 
                 embedding = emb_out, 
                 start_tokens = start_tokens, 
                 end_token = end_token, 
                 initial_state = initial_state_beam, 
                 beam_width = beam_width 
                 ,output_layer=out_layer) 

    beam_output, t1 , t2 = tf.contrib.seq2seq.dynamic_decode( my_decoder, 
                   maximum_iterations=maxlen) 

    beam_logits = tf.no_op() 
    beam_sample_id = beam_output.predicted_ids

当我训练结束后拨打梁_sample_id我没有得到正确的结果。

我的猜测是我们应该使用相同的注意力包装，但这是不可能的，因为我们必须使用tile_sequence来使用波束搜索。

任何见解/建议将不胜感激。

我也创造了他们的主要信息库这个问题Issue-93

来源

2017-09-03 Piyank Sarawagi

我不知道什么你的意思：“我不能够得到结果”，但我假设你的模式是不利用训练时学到的知识。

如果是这种情况，那么首先你需要知道它的所有关于变量共享的问题，你需要做的第一件事就是在训练和推理之间摆脱不同的变量范围，需要使用一些东西一样

删除

with tf.name_scope("Training"):

及用途：

with tf.variable_scope("myScope"):

然后取出后with tf.variable_scope("myScope")

enc_rnn_out = tf.contrib.seq2seq.tile_batch(enc_rnn_out , 1) 
seq_len  = tf.contrib.seq2seq.tile_batch(seq_len  , 1) 
enc_rnn_state = tf.contrib.seq2seq.tile_batch(enc_rnn_state , 1)

with tf.name_scope("Inference"):

，并改用

也在你的开始，这将确保你的推理变量和培训变量具有相同的签名和共享，

我测试过这个，当我跟着相同的tuto您提到的里亚尔，我的模型仍然在训练，因为我正在撰写这篇文章，但我可以看到，我们说话的准确性越来越高，这表明解决方案也适用于您。

谢谢

来源

2017-09-04 16:17:23

是的我没有能够使用我的方法在训练过程中学到的权重。 tf.name_scope（）在版本1.3中没有参数“reuse”，你必须是tf.variable_scope（）。我通过在@dnnavn在[github问题]（https://github.com/tensorflow/nmt/issues/93）中指出我创建了两个单独的训练和推理图来解决这个问题，他声称它只能通过单独的图表，我需要尝试一下。同时，如果你已经成功地尝试了它，请做评论。谢谢 –

是的，tf.variable_scope代替tf。name_scope –

嗨同样，我可以看到您已将此答案标记为正确，您是否有更改对数据进行测试？ –

在张量流中实现注意束搜索

回答

相关问题