1

I am trying to run some neural network code in Python. I had it working okay on a Google Colab. I then moved the code to a Jupyter Notebook on a remote machine GPU.

It runs okay until I try to fit the model using:

history = model.fit_generator(generator=training_generator, validation_data=validation_generator, use_multiprocessing=True, workers=1, epochs=100, shuffle=True, verbose=1)

The full error message follows. I just don't know where to begin understanding what it means, so I'm looking for help. Thanks in advance:

UnknownError                              Traceback (most recent call last)
<ipython-input-15-d3d33225fec8> in <module>
      1 # Train model on dataset
----> 2 history = model.fit_generator(generator=training_generator, validation_data=validation_generator, use_multiprocessing=True, workers=1, epochs=100, shuffle=True, verbose=1)

~/miniconda3/lib/python3.7/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

~/miniconda3/lib/python3.7/site-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1416             use_multiprocessing=use_multiprocessing,
   1417             shuffle=shuffle,
-> 1418             initial_epoch=initial_epoch)
   1419 
   1420     @interfaces.legacy_generator_methods_support

~/miniconda3/lib/python3.7/site-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    215                 outs = model.train_on_batch(x, y,
    216                                             sample_weight=sample_weight,
--> 217                                             class_weight=class_weight)
    218 
    219                 outs = to_list(outs)

~/miniconda3/lib/python3.7/site-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1215             ins = x + y + sample_weights
   1216         self._make_train_function()
-> 1217         outputs = self.train_function(ins)
   1218         return unpack_singleton(outputs)
   1219 

~/miniconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2713                 return self._legacy_call(inputs)
   2714 
-> 2715             return self._call(inputs)
   2716         else:
   2717             if py_any(is_tensor(x) for x in inputs):

~/miniconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
   2673             fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
   2674         else:
-> 2675             fetched = self._callable_fn(*array_vals)
   2676         return fetched[:len(self.outputs)]
   2677 

~/miniconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1437           ret = tf_session.TF_SessionRunCallable(
   1438               self._session._session, self._handle, args, status,
-> 1439               run_metadata_ptr)
   1440         if run_metadata:
   1441           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    526             None, None,
    527             compat.as_text(c_api.TF_Message(self.status.status)),
--> 528             c_api.TF_GetCode(self.status.status))
    529     # Delete the underlying status object from memory otherwise it stays alive
    530     # as there is a reference to status from this from the traceback due to

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node conv2d_1/convolution}}]]
     [[{{node metrics/acc/Mean}}]]
3
  • 1
    I have seen this error whenever CUDNN is not compatible with the CUDA version. Have you checked if your CUDNN, CUDA, Driver are compatible? Commented Dec 5, 2019 at 22:26
  • I'm not sure how to - but thanks for giving me a place to start :) Commented Dec 5, 2019 at 22:47
  • 1
    This page might be useful tensorflow.org/install/gpu . But might involve some trial and error Commented Dec 5, 2019 at 22:52

1 Answer 1

1

As @thushv89 says, this is an issue with compatibility of TF binary and installed CUDNN version.

You can check your tensorflow version using:

python -c 'import tensorflow as tf; print(tf.__version__);'

Than check required CUDA/CUDNN version here: https://www.tensorflow.org/install/source#tested_build_configurations

Note: indicated CUDA/CUDNN versions are only relevant for official distribution of TF. For conda there should be a better way to deal with it.

Then you can check you CUDA version:

nvcc --version

Then check your CUDNN version using one of the following:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.