2016-12-06 94 views
0

我安装了Theano(TH),Tensorflow(TF)和Keras。 基本测试似乎表明,它们与GPU(GTX 1070),Cuda 8.0,cuDNN5.1一起使用。Keras + Tensorflow优化档位

如果我用TH作为后端运行cifar10_cnn.py Keras example,它似乎可以正常工作,时间约为18s/epoch。 如果我用TF运行它然后几乎(它偶尔有效,不能再现它),优化在每时代以后失去acc = 0.1。这就好像权重没有更新一样。

这是一个耻辱,因为TF后端花费的时间大约是10s/epoch(即使是非常少的几次)。我使用的是Conda,我对Python很陌生。如果有帮助,“conda list”似乎为某些软件包显示了两个版本。

如果您有任何线索,请告诉我。谢谢。下面的截图:

python cifar10_cnn.py 

Using TensorFlow backend. 

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally 

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally 

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally 

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally 

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally 

X_train shape: (50000, 32, 32, 3) 

50000 train samples 

10000 test samples 

Using real-time data augmentation. 

Epoch 1/200 

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 

name: GeForce GTX 1070 

major: 6 minor: 1 memoryClockRate (GHz) 1.7845 

pciBusID 0000:01:00.0 

Total memory: 7.92GiB 

Free memory: 7.60GiB 

I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 

I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y 

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0) 

50000/50000 [==============================] - 11s - loss: 2.3029 - acc: 0.0999 - val_loss: 2.3026 - val_acc: 0.1000 

Epoch 2/200 

50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0980 - val_loss: 2.3026 - val_acc: 0.1000 

Epoch 3/200 

50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0992 - val_loss: 2.3026 - val_acc: 0.1000 

Epoch 4/200 

50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0980 - val_loss: 2.3026 - val_acc: 0.1000 

Epoch 5/200 

13184/50000 [======>.......................] - ETA: 7s - loss: 2.3026 - acc: 0.1044^CTraceback (most recent call last): 

回答

0

它看起来对我来说,这只是随机猜测,因为有10点的可能性,这是时权10%。我唯一能想到的是你的学习速度有点太高。我已经看到高学习率的模型有时会收敛,有时不会收敛。现在在后端,我认为theano会执行更多优化,所以这可能会略微影响某些内容。尝试降低学习率10倍,看看它是否收敛。

+0

谢谢,我把学习率降低到0.001,它似乎已经奏效。我认为GitHub上的一个例子可以“开箱即用”,但也许它只是在TH上进行测试。再次,谢谢。 – ozne