How to load numpy array in a tensorflow dataset

Question

I'm trying to create a Dataset object in tensorflow 1.14 (I have some legacy code that i can't change for this specific project) starting from numpy arrays, but everytime i try i get everything copied on my graph and for this reason when i create an event log file it is huge (719 MB in this case).

Originally i tried using this function "tf.data.Dataset.from_tensor_slices()", but it didn't work, then i read it is a common problem and someone suggested me to try with generators, thus i tried with the following code, but again i got a huge event file (719 MB again)

def fetch_batch(x, y, batch):
    i = 0
    while i < batch:
        yield (x[i,:,:,:], y[i])
        i +=1

train, test = tf.keras.datasets.fashion_mnist.load_data()
images, labels = train  
images = images/255

training_dataset = tf.data.Dataset.from_generator(fetch_batch, 
    args=[images, np.int32(labels), batch_size], output_types=(tf.float32, tf.int32), 
    output_shapes=(tf.TensorShape(features_shape), tf.TensorShape(labels_shape)))

file_writer = tf.summary.FileWriter("/content", graph=tf.get_default_graph())

I know in this case I could use tensorflow_datasets API and it would be easier, but this is a more general question, and it involves how to create datasets in general, not only using the mnist one. Could you explain to me what am i doing wrong? Thank you

Can you explain a bit more in detail what's causing your event file to be that large? Is it creating repetitive subgraphs? — thushv89
– thushv89, Commented Nov 25, 2019 at 0:47
Could you explain what didn't work with from_tensor_slices? — Zaccharie Ramzi
– Zaccharie Ramzi, Commented Nov 25, 2019 at 13:43

Zaccharie Ramzi · Accepted Answer · 2019-11-25 13:27:09Z

4

I guess it's because you are using args in from_generator. This will surely put the provided args in the graph.

What you could do is define a function that will return a generator that will iterate through your set, something like (haven't tested):

def data_generator(images, labels):
  def fetch_examples():
    i = 0
    while True:
      example = (images[i], labels[i])
      i += 1
      i %= len(labels)
      yield example
  return fetch_examples

This would give in your example:

train, test = tf.keras.datasets.fashion_mnist.load_data()
images, labels = train  
images = images/255

training_dataset = tf.data.Dataset.from_generator(data_generator(images, labels), output_types=(tf.float32, tf.int32), 
    output_shapes=(tf.TensorShape(features_shape), tf.TensorShape(labels_shape))).batch(batch_size)

file_writer = tf.summary.FileWriter("/content", graph=tf.get_default_graph())

Note that I changed fetch_batch to fetch_examples since you probably want to batch using the dataset utilities (.batch).

edited Nov 25, 2019 at 13:27

answered Nov 25, 2019 at 13:00

Zaccharie Ramzi

2,3861 gold badge23 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Stefano Over a year ago

Yes, i think you are right. I was about to post right now how i solved the problem (actually i found a guy in github who was suggesting this) and it worked for me github.com/tensorflow/tensorflow/issues/14053 Thank you!

Zaccharie Ramzi Over a year ago

Cool, if this solution works then please accept it. Also, next time for your question, try to provide a code that can work if copy-pasted and include the version numbers (typically for tensorflow, the API changes a lot between 1.14 and 2.0).

Stefano Over a year ago

Done, but one more thing: what do you mean with "you won't be able to use multiprocessing"? Is there any more efficient way?

Zaccharie Ramzi Over a year ago

Actually forget what I said, you don't need multiprocessing at this stage (just getting the data), so I was completely confused.

Collectives™ on Stack Overflow

How to load numpy array in a tensorflow dataset

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related