2017-05-29 74 views
1

我想实现一个简单的数字梯度检查使用Python 3和numpy来用于​​神经网络。神经网络数值渐变检查不使用矩阵使用Python-numpy

它适用于简单的一维函数,但在应用于参数矩阵时失败。

我的猜测是,我的成本函数不能很好地计算矩阵,或者我做数字梯度检查的方式在某种程度上是错误的。

看到下面的代码,并感谢您的帮助!

import numpy as np 
import random 
import copy 

def gradcheck_naive(f, x): 
    """ Gradient check for a function f. 

    Arguments: 
    f -- a function that takes a single argument (x) and outputs the 
     cost (fx) and its gradients grad 
    x -- the point (numpy array) to check the gradient at 
    """ 
    rndstate = random.getstate() 
    random.setstate(rndstate) 
    fx, grad = f(x) # Evaluate function value at original point 
    #fx=cost 
    #grad=gradient 
    h = 1e-4   
    # Iterate over all indexes in x 
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) 
    while not it.finished: 
     ix = it.multi_index #multi-index number 

     random.setstate(rndstate) 
     xp = copy.deepcopy(x) 
     xp[ix] += h 
     fxp, gradp = f(xp) 

     random.setstate(rndstate) 
     xn = copy.deepcopy(x) 
     xn[ix] -= h 
     fxn, gradn = f(xn) 
     numgrad = (fxp-fxn)/(2*h) 

     # Compare gradients 
     reldiff = abs(numgrad - grad[ix])/max(1, abs(numgrad), abs(grad[ix])) 
     if reldiff > 1e-5: 
      print ("Gradient check failed.") 
      print ("First gradient error found at index %s" % str(ix)) 
      print ("Your gradient: %f \t Numerical gradient: %f" % (
       grad[ix], numgrad)) 
      return 

     it.iternext() # Step to next dimension 

    print ("Gradient check passed!") 

#sanity check with 1D function 
exp_f = lambda x: (np.sum(np.exp(x)), np.exp(x)) 
gradcheck_naive(exp_f, np.random.randn(4,5)) #this works fine 

#sanity check with matrices 
#forward pass 
W = np.random.randn(5,10) 
x = np.random.randn(10,3) 
D = W.dot(x) 

#backpropagation pass 
gradx = W 

func_f = lambda x: (np.sum(W.dot(x)), gradx) 
gradcheck_naive(func_f, np.random.randn(10,3)) #this does not work (grad check fails) 

回答

1

我想通了! (我的数学老师会很自豪...)

简短的回答是,我混合矩阵点产品和元素明智的产品。

当使用元素之积,梯度等于:

W = np.array([[2,4],[3,5],[3,1]]) 
x = np.array([[1,7],[5,-1],[4,7]]) 
D = W*x #element-wise multiplication 

gradx = W 

func_f = lambda x: (np.sum(W*x), gradx) 
gradcheck_naive(func_f, np.random.randn(3,2)) 

当使用点积,梯度变为:

W = np.array([[2,4],[3,5]])) 
x = np.array([[1,7],[5,-1],[5,1]]) 
D = x.dot(W) 

unitary = np.array([[1,1],[1,1],[1,1]]) 
gradx = unitary.dot(np.transpose(W)) 

func_f = lambda x: (np.sum(x.dot(W)), gradx) 
gradcheck_naive(func_f, np.random.randn(3,2)) 

我也想知道如何明智做了元件产品的行为与不等尺寸矩阵如下:

x = np.random.randn(10) 
W = np.random.randn(3,10) 

D1 = x*W 
D2 = W*x 

变成那D1 = D2(与W = 3x10相同的尺寸),我的理解是x被numpy广播为一个3x10矩阵,允许元素明智地相乘。

结论:如果有疑问,用小矩阵写出来找出错误的位置。

+0

你做得很好! – Aaron