这是为什么梯度无效（theano）

我不知道为什么下面的代码是无效的..这是为什么梯度无效（theano）

from numpy import * 
import theano.tensor as T 
x = T.dmatrix("x") 
mx = x[...,None,:] 
a = T.ones((1,3)) 
T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,10)).astype(float32)})

下面的错误出现：

--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
/home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs) 
    883    outputs =\ 
--> 884     self.fn() if output_subset is None else\ 
    885     self.fn(output_subset=output_subset) 

ValueError: Shape mismatch: A.shape[1] != x.shape[0] 

During handling of the above exception, another exception occurred: 

ValueError        Traceback (most recent call last) 
<ipython-input-74-52410617594a> in <module>() 
     3 mx = x[...,None,:] 
     4 a = T.ones((1,3)) 
----> 5 T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,10)).astype(float32)}) 

/home/yu/anaconda3/lib/python3.5/site-packages/theano/gof/graph.py in eval(self, inputs_to_values) 
    517   args = [inputs_to_values[param] for param in inputs] 
    518 
--> 519   rval = self._fn_cache[inputs](*args) 
    520 
    521   return rval 

/home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs) 
    896      node=self.fn.nodes[self.fn.position_of_error], 
    897      thunk=thunk, 
--> 898      storage_map=getattr(self.fn, 'storage_map', None)) 
    899    else: 
    900     # old-style linkers raise their own exceptions 

/home/yu/anaconda3/lib/python3.5/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map) 
    323   # extra long error message in that case. 
    324   pass 
--> 325  reraise(exc_type, exc_value, exc_trace) 
    326 
    327 

/home/yu/anaconda3/lib/python3.5/site-packages/six.py in reraise(tp, value, tb) 
    683    value = tp() 
    684   if value.__traceback__ is not tb: 
--> 685    raise value.with_traceback(tb) 
    686   raise value 
    687 

/home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs) 
    882   try: 
    883    outputs =\ 
--> 884     self.fn() if output_subset is None else\ 
    885     self.fn(output_subset=output_subset) 
    886   except Exception: 

ValueError: Shape mismatch: A.shape[1] != x.shape[0] 
Apply node that caused the error: CGemv{inplace}(AllocEmpty{dtype='float64'}.0, TensorConstant{1.0}, InplaceDimShuffle{1,0}.0, Rebroadcast{0}.0, TensorConstant{0.0}) 
Toposort index: 7 
Inputs types: [TensorType(float64, vector), TensorType(float64, scalar), TensorType(float64, matrix), TensorType(float64, vector), TensorType(float64, scalar)] 
Inputs shapes: [(3,),(), (3, 5), (1,),()] 
Inputs strides: [(8,),(), (8, 24), (80,),()] 
Inputs values: [array([ 0.00000000e+000, 4.94065646e-324, 9.88131292e-324]), array(1.0), 'not shown', array([ 1.]), array(0.0)] 
Inputs type_num: [12, 12, 12, 12, 12] 
Outputs clients: [[InplaceDimShuffle{x,0}(CGemv{inplace}.0)]] 

Debugprint of the apply node: 
CGemv{inplace} [id A] <TensorType(float64, vector)> '' 
|AllocEmpty{dtype='float64'} [id B] <TensorType(float64, vector)> '' 
| |TensorConstant{3} [id C] <TensorType(int64, scalar)> 
|TensorConstant{1.0} [id D] <TensorType(float64, scalar)> 
|InplaceDimShuffle{1,0} [id E] <TensorType(float64, matrix)> '' 
| |Alloc [id F] <TensorType(float64, matrix)> '' 
| |TensorConstant{(1, 1) of 1.0} [id G] <TensorType(float64, (True, True))> 
| |Shape_i{0} [id H] <TensorType(int64, scalar)> '' 
| | |x [id I] <TensorType(float64, matrix)> 
| |TensorConstant{3} [id C] <TensorType(int64, scalar)> 
|Rebroadcast{0} [id J] <TensorType(float64, vector)> '' 
| |Subtensor{int8, ::, int64} [id K] <TensorType(float64, (True,))> '' 
| |InplaceDimShuffle{0,x,1} [id L] <TensorType(float64, (False, True, False))> '' 
| | |x [id I] <TensorType(float64, matrix)> 
| |Constant{0} [id M] <int8> 
| |Constant{0} [id N] <int64> 
|TensorConstant{0.0} [id O] <TensorType(float64, scalar)> 

Storage map footprint: 
- x, Input, Shape: (5, 10), ElemSize: 8 Byte(s), TotalSize: 400 Byte(s) 
- InplaceDimShuffle{0,x,1}.0, Shape: (5, 1, 10), ElemSize: 8 Byte(s), TotalSize: 400 Byte(s) 
- Alloc.0, Shape: (5, 3), ElemSize: 8 Byte(s), TotalSize: 120 Byte(s) 
- InplaceDimShuffle{1,0}.0, Shape: (3, 5), ElemSize: 8 Byte(s), TotalSize: 120 Byte(s) 
- AllocEmpty{dtype='float64'}.0, Shape: (3,), ElemSize: 8 Byte(s), TotalSize: 24 Byte(s) 
- Subtensor{int8, ::, int64}.0, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s) 
- Shape_i{0}.0, Shape:(), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s) 
- TensorConstant{1.0}, Shape:(), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s) 
- TensorConstant{0.0}, Shape:(), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s) 
- Constant{0}, Shape:(), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s) 
- Rebroadcast{0}.0, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s) 
- TensorConstant{3}, Shape:(), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s) 
- TensorConstant{(1, 1) of 1.0}, Shape: (1, 1), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s) 
- Constant{0}, Shape:(), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s) 
TotalSize: 593.0 Byte(s) 0.000 GB 
TotalSize inputs: 441.0 Byte(s) 0.000 GB 

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. 

-----------------------------------------------------------------------

我想上面的脚本包括广播操作是错误的，因此在梯度操作之前没有使用广播如下，

x = T.tensor3("x") 
mx = x 
a = T.ones((1,3)) 
T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,1,10)).astype(float32)})

成功完成并倾倒了以下结果。

array([[ 5., 5., 5.]], dtype=float32)

但为什么前者的情况是无效的？广播在数学上无效的渐变？
为什么形状在梯度上错过很多次？

你能教我关于上述问题吗？

来源

2017-08-09 Yu Sato

这是为什么梯度无效（theano）

回答

相关问题