2016-08-04 50 views
3

我们已经阅读了关于调度的TensorFlow的论文。它可能会预先执行Graph并找到放置操作的“正确”设备。Can TensorFlow可以自动将操作安排到所有可用的GPU?

但是我们已经测试过使用tf.Session(config=tf.ConfigProto(log_device_placement=True))而没有指定任何设备来运行。我们发现所有的操作都放在第一个GPU中。

日志看起来像这样。

Adam/epsilon: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/epsilon: /job:localhost/replica:0/task:0/gpu:0 
Adam/beta2: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/beta2: /job:localhost/replica:0/task:0/gpu:0 
Adam/beta1: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/beta1: /job:localhost/replica:0/task:0/gpu:0 
Adam/learning_rate: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/learning_rate: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam_1: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam_1: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam/read: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam/read: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam_1: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam_1: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam/read: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam/read: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0 
Variable_1/Adam_1: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_1/Adam_1: /job:localhost/replica:0/task:0/gpu:0 

Variable也放置在GPU中。我保证调度器不够好,用户的最佳做法是我们应该指定使用CPU或GPU的操作,特别是当我们有多个GPU时。是对的吗?

回答

4

从v0.9开始,TensorFlow会将所有操作放在您拥有的第一个GPU上。所以你所观察的是100%的预期。现在如果您的问题是“TensorFlow能否自动将我的图形分布在我的4个GPU上,而无需我的干预?”,截至2016年8月的答案是否定的。

如果您试图利用本地机器上可用的所有GPU的功能,请查看此variation of the cifar10 tutorial。下一个级别将是replicated training with distributed tensorflow,但这可能是你想要做的事情的矫枉过正。在最近的所有虚拟化进程中,某个操作分配给哪个设备的问题很快就可能无关紧要。

+1

太好了。感谢您的详细解释。我们将使用'CUDA_VISIBLE_DEVICES'和'tf.device()'来灵活地放置它们。 – tobe

相关问题