2017-11-18 382 views
0

我使用tensorflow网站上给出的测试代码测试tensorflow与GPU在水蟒:Tensorflow GPU的错误:InvalidArgumentError:无法分配装置操作“MATMUL”

import tensorflow as tf 
with tf.device('/device:GPU:0'): 
    a = tf.constant([1,2,3,4,5,6],shape=[2,3],name='a') 
    b = tf.constant([1,2,3,4,5,6],shape=[3,2],name='b') 
    c = tf.matmul(a,b) 
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 
print(sess.run(c)) 

我创建蟒蛇环境使用pip install tensorflow-gpu安装tensorflow + gpu。 IPython的笔记本电脑被用来执行上面的代码,并不断收到错误

InvalidArgumentError: Cannot assign a device for operation 'MatMul': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available. 
    [[Node: MatMul = MatMul[T=DT_INT32, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](a, b)]] 

看来MatMul运营商无法在GPU上进行加载。我不知道为什么没有支持GPU设备的内核,因为正确安装了cuda和cudNN。否则,tensorflow消息显示gpu被识别:

name: GeForce GTX 1080 Ti 
major: 6 minor: 1 memoryClockRate (GHz) 1.683 
pciBusID 0000:02:00.0 
Total memory: 10.91GiB 
Free memory: 10.75GiB 
2017-11-17 19:12:50.212054: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x55a56f0c2420 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that. 
2017-11-17 19:12:50.213035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 1 with properties: 
name: GeForce GTX 1080 Ti 
major: 6 minor: 1 memoryClockRate (GHz) 1.683 
pciBusID 0000:82:00.0 
Total memory: 10.91GiB 
Free memory: 10.75GiB 
2017-11-17 19:12:50.213089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 0 and 1 
2017-11-17 19:12:50.213108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 1 and 0 
2017-11-17 19:12:50.213132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 1 
2017-11-17 19:12:50.213148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y N 
2017-11-17 19:12:50.213156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 1: N Y 
2017-11-17 19:12:50.213169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0) 
2017-11-17 19:12:50.213179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0) 
Device mapping: 
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0 
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0 
2017-11-17 19:12:50.471348: I tensorflow/core/common_runtime/direct_session.cc:300] Device mapping: 
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0 
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0 

有两个gpus,它们都遇到了同样的问题。 cuda和cudnn库安装正确,环境变量设置在anaconda中。 cuda示例(deviceQuery)代码能够被编译并且运行时没有错误,并且显示result = pass。否则,可以在CPU上加载Matmul并完成计算。程序中的变量ab能够加载到GPU设备上。给予tensorflow消息:

2017-11-17 20:27:25.965655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0) 
2017-11-17 20:27:25.965665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0) 
Device mapping: 
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0 
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0 
2017-11-17 20:27:26.228395: I tensorflow/core/common_runtime/direct_session.cc:300] Device mapping: 
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0 
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0 

MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0 
2017-11-17 20:27:26.229489: I tensorflow/core/common_runtime/simple_placer.cc:872] MatMul: (MatMul)/job:localhost/replica:0/task:0/cpu:0 
b: (Const): /job:localhost/replica:0/task:0/gpu:0 
2017-11-17 20:27:26.229512: I tensorflow/core/common_runtime/simple_placer.cc:872] b: (Const)/job:localhost/replica:0/task:0/gpu:0 
a: (Const): /job:localhost/replica:0/task:0/gpu:0 
2017-11-17 20:27:26.229526: I tensorflow/core/common_runtime/simple_placer.cc:872] a: (Const)/job:localhost/replica:0/task:0/gpu:0 

我重新安装了nvidia驱动,CUDA和蟒蛇几次,但从来没有解决这个问题。如果有任何建议,这将是非常好的。

  • OS平台及分销:Linux操作系统Ubuntu 16.04
  • 安装TensorFlow:二进制
  • TensorFlow版本:1.3
  • Python版本:2.7.14
  • GCC/Compiler版本(如果从源代码编译):5.4.0
  • NVIDIA驱动程序:384.98
  • CUDA/cuDNN版本:CUDA 8.0/6.0 cuDNN
  • GPU型号和内存:的Geforce 1080Ti

回答

0

你尝试在GPU上使用tf.int32DT_INT32)数据类型来生成多张张数。错误消息是说GPU不支持乘以DT_INT32张量。

注意的是,网站上的代码被使用浮动张量(tf.float32)(假设你在https://www.tensorflow.org/tutorials/using_gpu谈论代码)

发生变化:

a = tf.constant([1,2,3,4,5,6],shape=[2,3],name='a') 

到:

a = tf.constant([1.,2.,3.,4.,5.,6.],shape=[2,3],name='a') 

或者:

a = tf.constant([1,2,3,4,5,6],shape=[2,3],name='a',dtype=tf.float32) 

b类似,应该使错误消失,因为肯定有支持GPU上float32张量矩阵乘法的内核。

希望有所帮助。

+0

很好的解释。它已经解决了,谢谢! – Xinzhou

相关问题