6

I installed Cuda-8.0 and Tensorflow GPU version on ubuntu 16.04. It was working fine initally and using GPU. But suddenly it has stopped using GPU. I installed tensorflow through pip and correctly the GPU version as it worked and used GPU initially.

The message I get while importing tensorflow is:

>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/lib/x86_64-linux-gnu
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

So clearly it's even able to locate cuda library from LD_LIBRARY_PATH. But when I get following output:

>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: naman-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: naman-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 375.39.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.39  Tue Jan 31 20:47:00 PST 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 375.39.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 375.39.0
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:

So it's not able to locate GPU. nvidia-smi gives following output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Graphics Device     Off  | 0000:01:00.0      On |                  N/A |
| 23%   41C    P8    11W / 250W |    337MiB / 11169MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1005    G   /usr/lib/xorg/Xorg                             197MiB |
|    0      2032    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd    89MiB |
|    0     30355    G   compiz                                          37MiB |
+-----------------------------------------------------------------------------+

I browsed other links on stackoverflow, but they mostly ask to check LD_LIBRARY_PATH or nvidia-smi. For me both are expected, so not able to understand the issue.

EDIT: I tried installing cudnn 5 and putting it in LD_LIBRARY_PATH also, tensorflow reads it successfully but still the same error on creating session.

7
  • You need to install cuDNN library Commented Apr 19, 2017 at 6:10
  • @Drop I installed and I know that my LD_LIBRARY_PATH is not pointing to it. But shouldn't this still run without that? I am sure it was running without that but somehow something screwed up later. Commented Apr 19, 2017 at 6:12
  • @Drothe pouintp also I don't know why it wants libcudnn.so.5 only and not 6. I have 6 installed already and I didn't want to downgrade. Commented Apr 19, 2017 at 6:16
  • What I see is the log saying that cuDNN is missing. It asks for version 5 because your distribution of TF was linked against this version. You may rebuild TF against v6 if you wish (not sure yet if it is supported though). Also check if any of these are enabled, preventing TF from seeing GPUs. Commented Apr 19, 2017 at 17:01
  • Another strange thing is that nvidia-smi cannot resolve the name of the device (I see "Graphics Device" there). 250W and 12Gb, is it Titan X or Tesla? You may also want to check that the driver is installed correctly. Commented Apr 19, 2017 at 17:05

1 Answer 1

1

Simply rename "cudnn64_6.dll" to "cudnn64_5.dll".

Sign up to request clarification or add additional context in comments.

2 Comments

How/why would this fix the problem? Can you please expand your answer a bit into something more useful?
when u downloaded the cudnn 6.0 zip file, u found a file named "cudnn64_6.dll" inside the bin folder right ? rename that to "cudnn64_5.dll" and everything should work if u've actually installed tensorflow-gpu version

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.