2016-05-17 167 views
1

我已经写其由相同的卷积核卷积图像块num_unrollings倍成一排,并随后尝试最小化平均平方所得的值和目标输出之间的差值小Tensorflow程序。但是,当我使用大于1的num_unrollings运行模型时,我的损失(tf_loss)项相对于卷积内核(tf_kernel)的梯度为零,因此不会发生学习。Tensorflow梯度始终为零

这里是最小的代码(蟒蛇3)我可以想出一种再现问题,对长度遗憾:

import tensorflow as tf 
import numpy as np 

batch_size = 1 
kernel_size = 3 
num_unrollings = 2 

input_image_size = (kernel_size//2 * num_unrollings)*2 + 1 

graph = tf.Graph() 

with graph.as_default(): 
    # Input data 
    tf_input_images = tf.random_normal(
     [batch_size, input_image_size, input_image_size, 1] 
    ) 

    tf_outputs = tf.random_normal(
     [batch_size] 
    ) 

    # Convolution kernel 
    tf_kernel = tf.Variable(
     tf.zeros([kernel_size, kernel_size, 1, 1]) 
    ) 

    # Perform convolution(s) 
    _convolved_input = tf_input_images 
    for _ in range(num_unrollings): 
     _convolved_input = tf.nn.conv2d(
      _convolved_input, 
      tf_kernel, 
      [1, 1, 1, 1], 
      padding="VALID" 
     ) 

    tf_prediction = tf.reshape(_convolved_input, shape=[batch_size]) 

    tf_loss = tf.reduce_mean(
     tf.squared_difference(
      tf_prediction, 
      tf_outputs 
     ) 
    ) 

    # FIXME: why is this gradient zero when num_unrollings > 1?? 
    tf_gradient = tf.concat(0, tf.gradients(tf_loss, tf_kernel)) 

# Calculate and report gradient 
with tf.Session(graph=graph) as session: 

    tf.initialize_all_variables().run() 

    gradient = session.run(tf_gradient) 

    print(gradient.reshape(kernel_size**2)) 
    #prints [ 0. 0. 0. 0. 0. 0. 0. 0. 0.] 

谢谢您的帮助!

+0

初始化内核采用全零是不是一个好主意,并会在这种情况下导致的0梯度。 – etarion

回答

1

尝试的东西,如更换

# Convolution kernel 
tf_kernel = tf.Variable(
    tf.zeros([kernel_size, kernel_size, 1, 1]) 
) 

# Convolution kernel 
tf_kernel = tf.Variable(
    tf.random_normal([kernel_size, kernel_size, 1, 1]) 
)