2017-10-09 135 views
1

我正在训练一个简单的MLP,用Keras对MNIST数字进行分类。我遇到了一个问题,无论我使用哪种优化器和学习速度,模型都不会学习/下降,而且我的准确性保持与随机猜测一样好。Keras MNIST渐变下降卡滞/学习速度非常慢

下面的代码:

model2=Sequential() 
model2.add(Dense(output_dim=512, input_dim=784, activation='relu', name='dense1', kernel_initializer='random_uniform')) 
model2.add(Dropout(0.2, name='dropout1')) 
model2.add(Dense(output_dim=512, input_dim=512, activation='relu', name='dense2', kernel_initializer='random_uniform')) 
model2.add(Dropout(0.2, name='dropout2')) 
model2.add(Dense(output_dim=10, input_dim=512, activation='softmax', name='dense3', kernel_initializer='random_uniform')) 
model2.compile(optimizer=Adagrad(), loss='categorical_crossentropy', metrics=['accuracy']) 
model2.summary() 
model2.fit(image_train.as_matrix(),img_keras_lb,batch_size=128,epochs = 100) 

和输出:

Epoch 1/100 
33600/33600 [==============================] - 5s - loss: 14.6704 - acc: 0.0894  
Epoch 2/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 3/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 4/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 5/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 6/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 7/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 8/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 9/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 10/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 11/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 12/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 13/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 14/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 15/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 16/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 17/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 18/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 19/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 20/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 21/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892  
Epoch 22/100 
33600/33600 [==============================] - 4s - loss: 14.6809 - acc: 0.0892 

正如你可以看到,该模型没有学习任何东西。我也尝试过SGD,Adam和RMSprop,并将批量大小降低到32,16等。

任何指示为什么发生这种情况非常感谢!

回答

3

您正在使用ReLU激活,该激活基本上切断了低于0的激活,并且使用缺省情况下具有参数keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None)的默认random_normal初始化。正如你所看到的,初始化值非常接近于0,其中一半(-.05至0)根本不会被激活。而那些被激活的(0到0.05)非常非常缓慢地传播梯度。

我的猜测是将初始化更改为0n(这是ReLU的操作范围),并且您的模型应该快速收敛。

0

你没有收敛的原因是你需要使用交叉验证来调整模型的超参数。例如,对于Adagrad优化器,尝试将学习率设置为1e-3而不是默认的1e-2,例如1e-3。 :

model2.compile(optimizer=Adagrad(1e-3), loss='categorical_crossentropy', metrics=['accuracy']) 

你会看到模型会开始学习更好。此外,初始化和丢失率是一个因素,需要根据其他答案中提到的进行调整。