Python tensorflow creating tfrecord with multiple array features

Question

I am following the TensorFlow docs to generate a tf.record from three NumPy arrays, however, I am getting an error when trying to serialize the data. I want the resulting tfrecord to contain three features.

import numpy as np
import pandas as pd
# some random data
x = np.random.randn(85)
y = np.random.randn(85,2128)
z = np.random.choice(range(10),(85,155))

def _float_feature(value):
    """Returns a float_list from a float / double."""
    return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def _int64_feature(value):
    """Returns an int64_list from a bool / enum / int / uint."""
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def serialize_example(feature0, feature1, feature2):
    """
    Creates a tf.Example message ready to be written to a file.
    """
    # Create a dictionary mapping the feature name to the tf.Example-compatible
    # data type.
    feature = {
      'feature0': _float_feature(feature0),
      'feature1': _float_feature(feature1),
      'feature2': _int64_feature(feature2)
    }
    # Create a Features message using tf.train.Example.
    example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
    return example_proto.SerializeToString()

features_dataset = tf.data.Dataset.from_tensor_slices((x, y, z))

features_dataset

<TensorSliceDataset shapes: ((), (2128,), (155,)), types: (tf.float64, tf.float32, tf.int64)>

for f0,f1,f2 in features_dataset.take(1):
    print(f0)
    print(f1)
    print(f2)
def tf_serialize_example(f0,f1,f2):
  tf_string = tf.py_function(
    serialize_example,
    (f0,f1,f2),  # pass these args to the above function.
    tf.string)      # the return type is `tf.string`.
  return tf.reshape(tf_string, ()) # The result is a scalar

Yet, when trying to run tf_serialize_example(f0,f1,f2)

I am getting the error:

InvalidArgumentError: TypeError: <tf.Tensor: shape=(2128,), dtype=float32, numpy=
array([-0.5435242 ,  0.97947884, -0.74457455, ...,  has type tensorflow.python.framework.ops.EagerTensor, but expected one of: int, long, float
Traceback (most recent call last):

I think the reason is, that my features are arrays and not numbers. How do I make this code work for features, which are arrays and not numbers?

sorry, but I don't see tf_serialize_example in your code. Can you clarify so that one can reproduce your error? — dopexxx
– dopexxx, Commented Jan 30, 2020 at 16:46
If I replace value=[value] with value=value.flatten() the code runs fine for me! I use tf version: 1.15.0. What version do you have? — dopexxx
– dopexxx, Commented Jan 30, 2020 at 17:03
Thanks for the comment, my version is 2.1.0 And using flatten() yields the error : AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'flatten'. Should I maybe downgrade? — PascalIv
– PascalIv, Commented Jan 30, 2020 at 19:21
with version 1.15.0, I get "RuntimeError: __iter__() is only supported inside of tf.function or when eager execution is enabled." And if I add the line, ´tf.compat.v1.enable_eager_execution()´ I get UnknownError: AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'flatten'. — PascalIv
– PascalIv, Commented Jan 31, 2020 at 9:14

dopexxx · Accepted Answer · 2020-02-01 11:55:04Z

1

Okay, I found time to have a closer look now. I noticed that the usage of features_dataset and tf_serialize_example comes from the tutorial on the tensorflow webppage. I don't know what the advantages of this method are and how to fix this.

But here's a workflow that should work for your code (I re-opened the generated tfrecords files and they were fine).

import numpy as np
import tensorflow as tf

# some random data
x = np.random.randn(85)
y = np.random.randn(85,2128)
z = np.random.choice(range(10),(85,155))

def _float_feature(value):
    """Returns a float_list from a float / double."""
    return tf.train.Feature(float_list=tf.train.FloatList(value=value.flatten()))

def _int64_feature(value):
    """Returns an int64_list from a bool / enum / int / uint."""

    return tf.train.Feature(int64_list=tf.train.Int64List(value=value.flatten()))

def serialize_example(feature0, feature1, feature2):
    """
    Creates a tf.Example message ready to be written to a file.
    """
    # Create a dictionary mapping the feature name to the tf.Example-compatible
    # data type.
    feature = {
      'feature0': _float_feature(feature0),
      'feature1': _float_feature(feature1),
      'feature2': _int64_feature(feature2)
    }
    # Create a Features message using tf.train.Example.
    return tf.train.Example(features=tf.train.Features(feature=feature))


writer = tf.python_io.TFRecordWriter('TEST.tfrecords')
example = serialize_example(x,y,z)
writer.write(example.SerializeToString())
writer.close()

The main difference in this code is that you feed numpy arrays as opposed to tensorflow Tensors to serialize_example. Hope this helps

answered Feb 1, 2020 at 11:55

dopexxx

2,7061 gold badge24 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

PascalIv Over a year ago

Thanks a lot. This worked for me. I have problems parsing this again (getting "Can't parse serialized Example.", also when using this answer: stackoverflow.com/questions/53499409/…). However, I guess that is unrelated to this question.

Collectives™ on Stack Overflow

Python tensorflow creating tfrecord with multiple array features

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related