1

I am following the TensorFlow docs to generate a tf.record from three NumPy arrays, however, I am getting an error when trying to serialize the data. I want the resulting tfrecord to contain three features.

import numpy as np
import pandas as pd
# some random data
x = np.random.randn(85)
y = np.random.randn(85,2128)
z = np.random.choice(range(10),(85,155))

def _float_feature(value):
    """Returns a float_list from a float / double."""
    return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def _int64_feature(value):
    """Returns an int64_list from a bool / enum / int / uint."""
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def serialize_example(feature0, feature1, feature2):
    """
    Creates a tf.Example message ready to be written to a file.
    """
    # Create a dictionary mapping the feature name to the tf.Example-compatible
    # data type.
    feature = {
      'feature0': _float_feature(feature0),
      'feature1': _float_feature(feature1),
      'feature2': _int64_feature(feature2)
    }
    # Create a Features message using tf.train.Example.
    example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
    return example_proto.SerializeToString()

features_dataset = tf.data.Dataset.from_tensor_slices((x, y, z))

features_dataset

<TensorSliceDataset shapes: ((), (2128,), (155,)), types: (tf.float64, tf.float32, tf.int64)>

for f0,f1,f2 in features_dataset.take(1):
    print(f0)
    print(f1)
    print(f2)
def tf_serialize_example(f0,f1,f2):
  tf_string = tf.py_function(
    serialize_example,
    (f0,f1,f2),  # pass these args to the above function.
    tf.string)      # the return type is `tf.string`.
  return tf.reshape(tf_string, ()) # The result is a scalar

Yet, when trying to run tf_serialize_example(f0,f1,f2)

I am getting the error:

InvalidArgumentError: TypeError: <tf.Tensor: shape=(2128,), dtype=float32, numpy=
array([-0.5435242 ,  0.97947884, -0.74457455, ...,  has type tensorflow.python.framework.ops.EagerTensor, but expected one of: int, long, float
Traceback (most recent call last):

I think the reason is, that my features are arrays and not numbers. How do I make this code work for features, which are arrays and not numbers?

8
  • sorry, but I don't see tf_serialize_example in your code. Can you clarify so that one can reproduce your error? Commented Jan 30, 2020 at 16:46
  • If I replace value=[value] with value=value.flatten() the code runs fine for me! I use tf version: 1.15.0. What version do you have? Commented Jan 30, 2020 at 17:03
  • Thanks for the comment, my version is 2.1.0 And using flatten() yields the error : AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'flatten'. Should I maybe downgrade? Commented Jan 30, 2020 at 19:21
  • Yep, the code is for TF 1.X Commented Jan 30, 2020 at 19:28
  • with version 1.15.0, I get "RuntimeError: __iter__() is only supported inside of tf.function or when eager execution is enabled." And if I add the line, ´tf.compat.v1.enable_eager_execution()´ I get UnknownError: AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'flatten'. Commented Jan 31, 2020 at 9:14

1 Answer 1

1

Okay, I found time to have a closer look now. I noticed that the usage of features_dataset and tf_serialize_example comes from the tutorial on the tensorflow webppage. I don't know what the advantages of this method are and how to fix this.

But here's a workflow that should work for your code (I re-opened the generated tfrecords files and they were fine).

import numpy as np
import tensorflow as tf

# some random data
x = np.random.randn(85)
y = np.random.randn(85,2128)
z = np.random.choice(range(10),(85,155))

def _float_feature(value):
    """Returns a float_list from a float / double."""
    return tf.train.Feature(float_list=tf.train.FloatList(value=value.flatten()))

def _int64_feature(value):
    """Returns an int64_list from a bool / enum / int / uint."""

    return tf.train.Feature(int64_list=tf.train.Int64List(value=value.flatten()))

def serialize_example(feature0, feature1, feature2):
    """
    Creates a tf.Example message ready to be written to a file.
    """
    # Create a dictionary mapping the feature name to the tf.Example-compatible
    # data type.
    feature = {
      'feature0': _float_feature(feature0),
      'feature1': _float_feature(feature1),
      'feature2': _int64_feature(feature2)
    }
    # Create a Features message using tf.train.Example.
    return tf.train.Example(features=tf.train.Features(feature=feature))


writer = tf.python_io.TFRecordWriter('TEST.tfrecords')
example = serialize_example(x,y,z)
writer.write(example.SerializeToString())
writer.close()

The main difference in this code is that you feed numpy arrays as opposed to tensorflow Tensors to serialize_example. Hope this helps

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot. This worked for me. I have problems parsing this again (getting "Can't parse serialized Example.", also when using this answer: stackoverflow.com/questions/53499409/…). However, I guess that is unrelated to this question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.