Building a custom map function with tf.function in tf.data input pipeline

Question

I am trying to write a tf.function annotated map function in Python for a tensorflow tf.data input pipeline.

The function should convert a string into a one-hot encoded tensor. The input string has the format [ab12]+. (There are actually more chars and numbers in the string, but those are good enough for the example below.)

Here is a minimal example:

DIM = 100
DIM_A = 1
DIM_B = 2

pos = tf.Variable(0, dtype=tf.int32)

@tf.function
def my_func(string):
  output = np.zeros(DIM * 10, dtype=np.float32)
  pos.assign(0)
  for ch in tf.strings.bytes_split(string):
    if tf.math.equal(ch, tf.constant("1")):
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("2")):
        pos.assign_add(2)
    elif tf.math.equal(ch, tf.constant("a")):
        output[DIM_A + DIM * pos] = 1
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("b")):
        output[DIM_B + DIM * pos] = 1
        pos.assign_add(1)
  return output

s = b"a1b2b"
print(my_func(s))

Trying to calculate the index for the position where to set the 1 in the output tensor I get the following error:

NotImplementedError: in user code:

<ipython-input-14-baa9b1605ae2>:18 my_func  *
    output[DIM_A + DIM * pos] = 1
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:749 __array__
    " array.".format(self.name))

NotImplementedError: Cannot convert a symbolic Tensor (add:0) to a numpy array.

The code works in eager mode, but breaks when building a graph.

I have a working version that uses a dynamic sized TensorArray to build a sparse version of the output tensor first and then converting it to a dense tensor, but this is really slow. A fixed size TensorArray instead of the numpy array is also very slow. I am trying to make it faster.

Frederik Bode · Accepted Answer · 2020-05-12 11:57:31Z

1

1) You can't use numpy in graph mode, so output should be tf.zeros not np.zeros.

2) You can't assign to the tf.zeros Tensor, so you should probably just construct if from scratch using tf.one_hot.

Minimum working example:

import tensorflow as tf
import numpy as np 

DIM = 100
DIM_A = 1
DIM_B = 2

pos = tf.Variable(0, dtype=tf.int32)

@tf.function
def my_func(string):
  output = tf.zeros(DIM * 10, dtype=tf.float32)
  pos.assign(0)
  for ch in tf.strings.bytes_split(string):
    if tf.math.equal(ch, tf.constant("1")):
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("2")):
        pos.assign_add(2)
    elif tf.math.equal(ch, tf.constant("a")):
        output = tf.one_hot(DIM_A + DIM * pos, DIM * 10, dtype=tf.float32)
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("b")):
        output = tf.one_hot(DIM_B + DIM * pos, DIM * 10, dtype=tf.float32)
        pos.assign_add(1)
  return output

s = b"a1b2b"
print(my_func(s).numpy())

This function prints a one hot encoded vector. I don't know if the index is the one you want exactly, so you'll have to double check if the offset is correct.

answered May 12, 2020 at 11:57

Frederik Bode

2,7441 gold badge14 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

StefanMK Over a year ago

There is more than one "1" in my output tensor, but changing your "output = tf.one_hot()" to "output += tf.one_hot()" fixes this. However this is still very slow as my tensor size is about 1000 so every time about 1000 adds need to be made. The numpy array in my example works fine as long as the calculated index where to set the "1" doesn't depend on "pos".

Collectives™ on Stack Overflow

Building a custom map function with tf.function in tf.data input pipeline

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related