1

I am trying to install support for Tensorflow GPU using the following guide:

https://www.tensorflow.org/install/gpu

I am on Ubuntu (20.04 LTS)

I've followed the instruction for the latest Ubuntu below (Cuda 11):

# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update

wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt-get update

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-11-0 \
    libcudnn8=8.0.4.30-1+cuda11.0  \
    libcudnn8-dev=8.0.4.30-1+cuda11.0

# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install TensorRT. Requires that libcudnn8 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
    libnvinfer-dev=7.1.3-1+cuda11.0 \
    libnvinfer-plugin7=7.1.3-1+cuda11.0

After running this and rebooting, I have Cuda 11 and CuDNN 8.

After this I installed tensorflow with a simple pip install tensorflow, as I understood online there's no need to install tensorflow-gpu explicitly in the newer versions of tensorflow.

This is what I'm getting after trying to import tensorflow and check physical devices:

import tensorflow as tf

Result:

2021-06-02 16:04:03.347039: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
tf.config.list_physical_devies('GPU')

Result:

2021-06-02 16:11:19.035743: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-06-02 16:11:19.067500: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-02 16:11:19.067753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1
coreClock: 1.759GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s
2021-06-02 16:11:19.067771: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-02 16:11:19.069485: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-06-02 16:11:19.069529: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-06-02 16:11:19.069625: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2021-06-02 16:11:19.069689: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2021-06-02 16:11:19.069736: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-06-02 16:11:19.069796: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-06-02 16:11:19.069930: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-06-02 16:11:19.069938: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

It seems like tensorflow is complaining about 4 files (.so libraries):

  • libcufft.so.10
  • libcurand.so.10
  • libcusolver.so.11
  • libcusparse.so.11

I've tried to look for these in my system using the locate command on Ubuntu, they do not exist anywhere.

I haven't added anything to my .bashrc since I was not sure what the LD_LIBRARY_PATH must be.

2 Answers 2

1

Put this library path in the ~/.bashrc file and source then try

export PATH=/usr/local/cuda-11.0/bin:${PATH}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:${LD_LIBRARY_PATH}
export CUDA_HOME=/usr/local/cuda

Please change your path according to your setup.

Sign up to request clarification or add additional context in comments.

8 Comments

I have the directory in usr/local/cuda-11.0/ and I also have one in /usr/lib/cuda/ which I installed using sudo apt-get nvidia-cuda-toolkit (do I even need that?), after adding this to .bashrc 2 of the files were resolved but there are still 2 left
2021-06-02 16:50:20: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.0/lib64:/usr/local/cuda/lib64: 2021-06-02 16:50:20: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.0/lib64:/usr/local/cuda/lib64:
It seems like there are some missing files in usr/local/cuda-11.0/lib64, specifically, libcusolver.so.11 and libcusparse.so.11, have I not installed something?
You have a corrupted install. The instructions you posted in your question were correct. However you should not also have done sudo apt-get nvidia-cuda-toolkit. That command doesn't appear anywhere in the instructions you posted in your question. My suggestion would be to start over with a fresh install of Ubuntu, and follow the instructions you have posted in your question.
It was a bit confusing because they do ask you to install CUPTI and set the environment variable before these commands, and they mentioned CUPTI includes inside the cuda toolkit.
|
1

Try to install the missing libraries and check

apt-get install -y cuda-command-line-tools-11-4 libcublas-11-4 libcufft-11-4 libcurand-11-4 libcusolver-11-4 libcusparse-11-4 

Note that the version should be based on your CUDA version.

${CUDA/./-}

If your CUDA version is 11.2 then the library version will be libcufft-11-2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.