在Theano训练MLP

我有点卡住试图训练使用Theano的漂亮的标准MLP模型。我的模型代码看起来像这样在Theano训练MLP

 
class Layer(object): 
    def __init__(self, inputs, n_in, n_out, activation=T.nnet.softmax): 
     def weights(shape): 
      return np.array(np.random.uniform(size=shape), dtype='float64') 
     def biases(size): 
      return np.zeros((size), dtype='float64') 

     self.W = theano.shared(value=weights((n_in, n_out)), name='weights', borrow=True) 
     self.b = theano.shared(value=biases(n_out), name='biases', borrow=True) 
     self.output = activation(T.dot(inputs, self.W) + self.b) 
     self.pred = T.argmax(self.output, axis=1) 
     self.params = [self.W, self.b] 

class MLP(object): 
    def __init__(self, inputs, n_in, n_hidden, n_out): 
     """ for now lets go with one hidden layer""" 
     self._hidden = Layer(inputs, n_in, n_hidden, activation=T.tanh) 
     self._output = Layer(self._hidden.output, n_hidden, n_out) # softmax by default   
    def loss(self, one_hot): 
     return T.mean(T.sqr(one_hot - self._output.output)  
    def accuracy(self, y): 
     return T.mean(T.eq(self._output.pred, y))  
    def updates(self, loss, rate=0.01): 
     updates = [] 
     updates.append((self._hidden.W, self._hidden.W - rate * T.grad(cost=loss, wrt=self._hidden.W))) 
     updates.append((self._hidden.b, self._hidden.b - rate * T.grad(cost=loss, wrt=self._hidden.b))) 
     updates.append((self._output.W, self._output.W - rate * T.grad(cost=loss, wrt=self._output.W))) 
     updates.append((self._output.b, self._output.b - rate * T.grad(cost=loss, wrt=self._output.b))) 
     return updates

然后我试图培养像这样

 
x = T.matrix('x', dtype='float64') 
y = T.vector('y', dtype='int32') 

# basic logistic model 
# model = Layer(x, 784, 10, activation=T.nnet.softmax) 
# basic multi-layer perceptron 
model = MLP(x, 784, 128, 10) 

labels = T.extra_ops.to_one_hot(y, 10) 
# loss function 
#loss = T.mean(T.sqr(labels - model.output)) 
loss = model.loss(labels) 
# average number of correct predictions over a batch 
#accuracy = T.mean(T.eq(model.pred, y)) 
accuracy = model.accuracy(y) 

# updates 
#rate = 0.05 
#g_W = T.grad(cost=loss, wrt=model.W) 
#g_b = T.grad(cost=loss, wrt=model.b) 
#updates = [(model.W, model.W - rate * g_W), 
#   (model.b, model.b - rate * g_b)] 
updates = model.updates(loss, rate=0.3) 

# batch index 
index = T.scalar('batch index', dtype='int32') 
size = T.scalar('batch size', dtype='int32') 

train = theano.function([index, size], 
         [loss, accuracy], 
         updates=updates, 
         givens={x: train_set[0][index * size: (index + 1) * size], 
           y: train_set[1][index * size: (index + 1) * size]}) 

valid = theano.function([index, size], 
         [loss, accuracy], 
         givens={x: valid_set[0][index * size: (index + 1) * size], 
           y: valid_set[1][index * size: (index + 1) * size]}) 

test = theano.function([index, size], 
         [accuracy], 
         givens={x: test_set[0][index * size: (index + 1) * size], 
           y: test_set[1][index * size: (index + 1) * size]}) 

n_epochs = 10 
batch_size = 500 
# number of items in training dataset/batch size 
batches_in_epoch = datasets[0][0].shape[0] // batch_size 

losses = np.empty(0) 
errors = np.empty(0) 

for epoch in range(1, n_epochs + 1): 
    epoch_losses = np.empty(0) 
    epoch_errors = np.empty(0) 
    for batch_n in range(batches_in_epoch): 
     l, e = train(batch_n, batch_size) 
     epoch_losses = np.append(epoch_losses, l) 
     epoch_errors = np.append(epoch_errors, e) 
     print('[%s]' % time.ctime(), 
       'epoch: ', epoch, 
       'batch: ', batch_n, 
       'loss: ', np.round(l, 4), 
       'accuracy: ', np.round(e, 4)) 
    # shuffle train set every epoch 
    shuffle = np.arange(datasets[0][1].shape[0]) 
    np.random.shuffle(shuffle) 
    train_set[0] = train_set[0][shuffle] 
    train_set[1] = train_set[1][shuffle] 

    losses = np.concatenate([losses, epoch_losses]) 
    errors = np.concatenate([errors, epoch_errors]) 
    valid_l, valid_e = valid(0, datasets[1][0].shape[0]) 
    print('[%s]' % time.ctime(), 'epoch: ', epoch, 'validation loss: ', valid_l, 'validation accuracy: ', valid_e) 

acc = test(0, datasets[2][0].shape[0]) 
print() 
print('Final accuracy: ', np.round(acc, 4)[0])

现在，如果你看一下评论，我有一个基本的逻辑回归模型尝试过了，它的工作，我有80％的准确性。但是，当我用MLP模型替换它时，它不起作用。它不会收敛到任何东西，我会得到10％的准确度随机猜测。我究竟做错了什么？我使用的数据是以Theano教程的方式加载到共享变量中的MNIST数据集。

来源

2016-08-01 Mad Wombat

网络的构建取决于数据，但对于输入维数为784的数据集，在隐藏层中使用128个单位可能有点低（这是一个大的降维并可能导致信息丢失）。因此，少数隐藏单元可能会阻止收敛。你可能想看看[这里]（http://stackoverflow.com/questions/10565868/multi-layer-perceptron-mlp-architecture-criteria-for-choosing-number-of-hidde）和[这里]（ftp ：//ftp.sas.com/pub/neural/FAQ3.html#A_hu）。我建议你先从隐藏单位的高维开始，比如1024或512，然后再尝试使用较小的值来调整它。 – MGoksu

I已经尝试了许多不同的配置，并且我得到了与128,256,512,1024和2048相同的结果。当我用Tensorflow做这些时，所有这些对我来说都很适合。我获得了不同的精度，但即使有128个单位隐藏层，我的准确率也达到了97％左右。 MNIST不是一个难以分类的数据集。所以我怀疑这是我的Theano代码中的一个错误，而不是模型的问题。 –

这个问题似乎在于权重初始化。你如何在你的tensorflow实现中做到这一点？

我不太确定现在的底层数学，所以如果我错了，那么纠正我，但我喜欢解释它，就好像所有权重都是正数，模型无法学习负面特征。

您可以尝试将low=-1, high=1添加到初始化中（默认np.random.uniform介于0和1之间）。在我的测试中，这花了相当长的时间来收敛（~100个时代），但至少它确实如此。

使用这样有点聪明glorot initialization：

def weights(shape): 
    return np.random.uniform(low=-np.sqrt(6./sum(shape)), 
          high=np.sqrt(6./sum(shape)), 
          size=shape)

使得训练速度快了很多。 5代以后，我的验证准确度达到了90％左右，并将其添加到您的代码中。

这也是权重在theano MLP example中初始化的方式。

来源

2017-03-02 20:02:45 ValD

在Theano训练MLP

回答

相关问题