0

I used to multi-gpu system in tensorflow.

however, from someday, the following code used CPU only.

tf.debugging.set_log_device_placement(True)
strategy = tf.distribute.MirroredStrategy()

Moreover, the return of physical device check function is empty

tf.config.list_physical_devices('GPU')

The return of nvidia-smi correctly show as following picture enter image description here

Environment NVIDIA_SMI: 418.87.00

Driver ver: 418.87.00

CUDA ver: 10.1

Tensorflow: 2.4.1

CuDNN: enter image description here

How do I handle this issue?

1
  • Any system update? You may want to check and reinstall your GPU drivers. Commented Mar 17, 2021 at 8:46

2 Answers 2

1

Tensorflow 2.4 is compatible with cudnn v8.0 and cuda 11.

So, upgrade cuddn and cuda.

If you are not using Anaconda, update the system paths and ensure they aren't any previous version.

e.g.,

/usr/local/cuda/bin/nvcc --version

Conda install:

# conda update --force conda ## if needed
# conda update conda ## if needed
conda activate <env>
conda install cudatoolkit
conda install -c anaconda cudnn
conda list cuda
conda list cudnn

Here is a script for manual install / you'll probably need even if using conda:

wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt-get update

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-11-0 \
    libcudnn8=8.0.4.30-1+cuda11.0  \
    libcudnn8-dev=8.0.4.30-1+cuda11.0


# Install TensorRT. Requires that libcudnn8 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
    libnvinfer-dev=7.1.3-1+cuda11.0 \
    libnvinfer-plugin7=7.1.3-1+cuda11.0
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your help. I know my TensorFlow version is not compatible with other support software from your comment. I downgrade TensorFlow version, and I solve this issue.
1

Have you changed anything in eco system.

I would suggest you to install cuda 11 and cudnn 8.0 with tensorflow 2.4.0 and above.

Then give it a try. Hope your problem will be resolved.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.