Use of tf.data.Dataset with Keras input layer on Tensorflow 2.0

Question

I'm experimenting with TensorFlow 2.0 alpha and I've found that it works as expected when using Numpy arrays but when tf.data.Dataset is used, an input dimension error appears. I'm using the iris dataset as the simplest example to demonstrate this:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder

import tensorflow as tf
from tensorflow.python import keras

iris = datasets.load_iris()

scl = StandardScaler()
ohe = OneHotEncoder(categories='auto')
data_norm = scl.fit_transform(iris.data)
data_target = ohe.fit_transform(iris.target.reshape(-1,1)).toarray()
train_data, val_data, train_target, val_target = train_test_split(data_norm, data_target, test_size=0.1)
train_data, test_data, train_target, test_target = train_test_split(train_data, train_target, test_size=0.2)


train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_target))
train_dataset.batch(32)

test_dataset = tf.data.Dataset.from_tensor_slices((test_data, test_target))
test_dataset.batch(32)

val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_target))
val_dataset.batch(32)

mdl = keras.Sequential([
    keras.layers.Dense(16, input_dim=4, activation='relu'),
    keras.layers.Dense(8, activation='relu'),
    keras.layers.Dense(8, activation='relu'),
    keras.layers.Dense(3, activation='sigmoid')]
)

mdl.compile(
    optimizer=keras.optimizers.Adam(0.01),
    loss=keras.losses.categorical_crossentropy,
    metrics=[keras.metrics.categorical_accuracy]
    )

history = mdl.fit(train_dataset, epochs=10, steps_per_epoch=15, validation_data=val_dataset)

and I get the following error:

ValueError: Error when checking input: expected dense_16_input to have shape (4,) but got array with shape (1,)

assuming that the dataset has only one dimension. If I pass input_dim=1 I get a different error:

InvalidArgumentError: Incompatible shapes: [3] vs. [4]
     [[{{node metrics_5/categorical_accuracy/Equal}}]] [Op:__inference_keras_scratch_graph_8223]

What is the proper way to use tf.data.Dataset on a Keras model with Tensorflow 2.0?

It seems like your sample code has some formatting/syntax errors (the variable dataset is not declared, and there is a stray return keyword). Can you fix this and update the question? — Gabriel Ibagon
– Gabriel Ibagon, Commented May 3, 2019 at 3:58
Whats is the shape of your data_target after the hot encoding step? — borarak
– borarak, Commented May 3, 2019 at 8:44
Yes, it was more of an example. Fixed the code. The shape of data_target is (3,1). — Ivan
– Ivan, Commented May 3, 2019 at 10:45

Gabriel Ibagon · Accepted Answer · 2019-05-03 17:36:06Z

A few changes should fix your code. The batch() dataset transformation does not occur in-place, so you need to return the new datasets. Secondly, you should also add a repeat() transformation, so that the dataset continues to output examples after all of the data has been seen.

...

train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_target))
train_dataset = train_dataset.batch(32)
train_dataset = train_dataset.repeat()

val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_target))
val_dataset = val_dataset.batch(32)
val_dataset = val_dataset.repeat()
...

You also need to add the argument for validation_steps in the model.fit() function:

history = mdl.fit(train_dataset, epochs=10, steps_per_epoch=15, validation_data=val_dataset, validation_steps=1)

For your own data, you may need to adjust the batch_size for the validation dataset and validation_steps, such that the validation data is only cycled once during each step.

Collectives™ on Stack Overflow

Use of tf.data.Dataset with Keras input layer on Tensorflow 2.0

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related