如何在python中编码adagrad theano

为了简化问题，说一个维度（或特征）已经更新n次后，下次看到该特征时，我想将学习率设置为1/n 。如何在python中编码adagrad theano

我想出了这些代码：

def test_adagrad(): 
    embedding = theano.shared(value=np.random.randn(20,10), borrow=True) 
    times = theano.shared(value=np.ones((20,1))) 
    lr = T.dscalar() 
    index_a = T.lvector() 
    hist = times[index_a] 
    cost = T.sum(theano.sparse_grad(embedding[index_a])) 
    gradients = T.grad(cost, embedding) 
    updates = [(embedding, embedding+lr*(1.0/hist)*gradients)] 
    ### Here should be some codes to update also times which are omitted ### 
    train = theano.function(inputs=[index_a, lr],outputs=cost,updates=updates) 
    for i in range(10): 
    print train([1,2,3],0.05)

Theano不给任何错误，但培训效果给予楠有时。有谁知道如何解决这个问题吗？

谢谢您的帮助

PS：我怀疑它是在产生问题稀疏空间的操作。所以我试图用theano.sparse.mul替换*。这给了我前面提到的一些结果

来源

2015-03-31 jmf_zaiecp

也许你可以利用以下example for implementation of adadelta，并用它来派生自己的。请更新，如果你成功:-)

来源

2015-04-15 07:12:23 zuuz

非常感谢您的回答 – 2015-04-16 08:15:44

不客气:-)如果你发现它是有用的，请注明的答案为“接受”，并给予好评吧： - ）此外 - 如果你想跟进未来的用户 - 你也可以附上你的实施... – zuuz 2015-04-17 11:23:10

我正在寻找同样的事情，并最终实现它自己在资源zuuz已指出的风格。所以，这可能有助于任何人在这里寻找帮助。

def adagrad(lr, tparams, grads, inp, cost): 
    # stores the current grads 
    gshared = [theano.shared(np.zeros_like(p.get_value(), 
              dtype=theano.config.floatX), 
          name='%s_grad' % k) 
       for k, p in tparams.iteritems()] 
    grads_updates = zip(gshared, grads) 
    # stores the sum of all grads squared 
    hist_gshared = [theano.shared(np.zeros_like(p.get_value(), 
               dtype=theano.config.floatX), 
            name='%s_grad' % k) 
        for k, p in tparams.iteritems()] 
    rgrads_updates = [(rg, rg + T.sqr(g)) for rg, g in zip(hist_gshared, grads)] 

    # calculate cost and store grads 
    f_grad_shared = theano.function(inp, cost, 
            updates=grads_updates + rgrads_updates, 
            on_unused_input='ignore') 

    # apply actual update with the initial learning rate lr 
    n = 1e-6 
    updates = [(p, p - (lr/(T.sqrt(rg) + n))*g) 
       for p, g, rg in zip(tparams.values(), gshared, hist_gshared)] 

    f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore') 

    return f_grad_shared, f_update

来源

2016-04-25 21:20:16 ValD

我发现this implementation from Lasagne非常简洁和可读。您可以使用它相当多，因为它是：

for param, grad in zip(params, grads): 
    value = param.get_value(borrow=True) 
    accu = theano.shared(np.zeros(value.shape, dtype=value.dtype), 
         broadcastable=param.broadcastable) 
    accu_new = accu + grad ** 2 
    updates[accu] = accu_new 
    updates[param] = param - (learning_rate * grad/
           T.sqrt(accu_new + epsilon))

来源

2016-09-19 06:58:27

如何在python中编码adagrad theano

回答

相关问题