1

I am trying to write a tf.function annotated map function in Python for a tensorflow tf.data input pipeline.

The function should convert a string into a one-hot encoded tensor. The input string has the format [ab12]+. (There are actually more chars and numbers in the string, but those are good enough for the example below.)

Here is a minimal example:

DIM = 100
DIM_A = 1
DIM_B = 2

pos = tf.Variable(0, dtype=tf.int32)

@tf.function
def my_func(string):
  output = np.zeros(DIM * 10, dtype=np.float32)
  pos.assign(0)
  for ch in tf.strings.bytes_split(string):
    if tf.math.equal(ch, tf.constant("1")):
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("2")):
        pos.assign_add(2)
    elif tf.math.equal(ch, tf.constant("a")):
        output[DIM_A + DIM * pos] = 1
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("b")):
        output[DIM_B + DIM * pos] = 1
        pos.assign_add(1)
  return output

s = b"a1b2b"
print(my_func(s))

Trying to calculate the index for the position where to set the 1 in the output tensor I get the following error:

NotImplementedError: in user code:

<ipython-input-14-baa9b1605ae2>:18 my_func  *
    output[DIM_A + DIM * pos] = 1
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:749 __array__
    " array.".format(self.name))

NotImplementedError: Cannot convert a symbolic Tensor (add:0) to a numpy array.

The code works in eager mode, but breaks when building a graph.

I have a working version that uses a dynamic sized TensorArray to build a sparse version of the output tensor first and then converting it to a dense tensor, but this is really slow. A fixed size TensorArray instead of the numpy array is also very slow. I am trying to make it faster.

1 Answer 1

1

1) You can't use numpy in graph mode, so output should be tf.zeros not np.zeros.

2) You can't assign to the tf.zeros Tensor, so you should probably just construct if from scratch using tf.one_hot.

Minimum working example:

import tensorflow as tf
import numpy as np 

DIM = 100
DIM_A = 1
DIM_B = 2

pos = tf.Variable(0, dtype=tf.int32)

@tf.function
def my_func(string):
  output = tf.zeros(DIM * 10, dtype=tf.float32)
  pos.assign(0)
  for ch in tf.strings.bytes_split(string):
    if tf.math.equal(ch, tf.constant("1")):
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("2")):
        pos.assign_add(2)
    elif tf.math.equal(ch, tf.constant("a")):
        output = tf.one_hot(DIM_A + DIM * pos, DIM * 10, dtype=tf.float32)
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("b")):
        output = tf.one_hot(DIM_B + DIM * pos, DIM * 10, dtype=tf.float32)
        pos.assign_add(1)
  return output

s = b"a1b2b"
print(my_func(s).numpy())

This function prints a one hot encoded vector. I don't know if the index is the one you want exactly, so you'll have to double check if the offset is correct.

Sign up to request clarification or add additional context in comments.

1 Comment

There is more than one "1" in my output tensor, but changing your "output = tf.one_hot()" to "output += tf.one_hot()" fixes this. However this is still very slow as my tensor size is about 1000 so every time about 1000 adds need to be made. The numpy array in my example works fine as long as the calculated index where to set the "1" doesn't depend on "pos".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.