我在这些instructions之后的virtualenv中安装了tensorflow的GPU版本。问题是,开始会话时出现分段错误。也就是说,该代码:在virtualenv上运行GPU集群上的tensorflow
import tensorflow as tf
sess = tf.InteractiveSession()
退出并出现以下错误:
(tesnsorflowenv)[email protected]$ python testtensorflow.py
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:93] Couldn't open CUDA library libcudnn.so.6.5. LD_LIBRARY_PATH: :/vol/cuda/7.0.28/lib64
I tensorflow/stream_executor/cuda/cuda_dnn.cc:1382] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 40
Segmentation fault
我尝试使用gdb的深入挖掘,但只得到了以下额外产出:
[New Thread 0x7fffdf880700 (LWP 32641)]
[New Thread 0x7fffdf07f700 (LWP 32642)]
... lines omitted
[New Thread 0x7fffadffb700 (LWP 32681)]
[Thread 0x7fffadffb700 (LWP 32681) exited]
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ??()
任何想法这里发生了什么以及如何解决它?
这里是NVIDIA-SMI的输出:
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 0000:06:00.0 Off | 0 |
| N/A 65C P0 142W/149W | 235MiB/11519MiB | 81% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 On | 0000:07:00.0 Off | 0 |
| N/A 25C P8 30W/149W | 55MiB/11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 On | 0000:0D:00.0 Off | 0 |
| N/A 27C P8 26W/149W | 55MiB/11519MiB | 0% Prohibited |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 On | 0000:0E:00.0 Off | 0 |
| N/A 25C P8 28W/149W | 55MiB/11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 On | 0000:86:00.0 Off | 0 |
| N/A 46C P0 85W/149W | 206MiB/11519MiB | 97% E. Process |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 On | 0000:87:00.0 Off | 0 |
| N/A 27C P8 29W/149W | 55MiB/11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 On | 0000:8D:00.0 Off | 0 |
| N/A 28C P8 26W/149W | 55MiB/11519MiB | 0% Prohibited |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 On | 0000:8E:00.0 Off | 0 |
| N/A 23C P8 30W/149W | 55MiB/11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
感谢在这个问题上的任何帮助!
请尝试从源代码构建以下说明[这里](https://www.tensorflow.org/versions /master/get_started/os_setup.html#installing-from-sources),最好以调试模式运行,并提供完整的堆栈跟踪。这可能有助于查明SIGSEGV的来源。 – keveman