I am trying to write a tf.function annotated map function in Python for a tensorflow tf.data input pipeline.
The function should convert a string into a one-hot encoded tensor. The input string has the format [ab12]+.
(There are actually more chars and numbers in the string, but those are good enough for the example below.)
Here is a minimal example:
DIM = 100
DIM_A = 1
DIM_B = 2
pos = tf.Variable(0, dtype=tf.int32)
@tf.function
def my_func(string):
output = np.zeros(DIM * 10, dtype=np.float32)
pos.assign(0)
for ch in tf.strings.bytes_split(string):
if tf.math.equal(ch, tf.constant("1")):
pos.assign_add(1)
elif tf.math.equal(ch, tf.constant("2")):
pos.assign_add(2)
elif tf.math.equal(ch, tf.constant("a")):
output[DIM_A + DIM * pos] = 1
pos.assign_add(1)
elif tf.math.equal(ch, tf.constant("b")):
output[DIM_B + DIM * pos] = 1
pos.assign_add(1)
return output
s = b"a1b2b"
print(my_func(s))
Trying to calculate the index for the position where to set the 1 in the output tensor I get the following error:
NotImplementedError: in user code:
<ipython-input-14-baa9b1605ae2>:18 my_func *
output[DIM_A + DIM * pos] = 1
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:749 __array__
" array.".format(self.name))
NotImplementedError: Cannot convert a symbolic Tensor (add:0) to a numpy array.
The code works in eager mode, but breaks when building a graph.
I have a working version that uses a dynamic sized TensorArray to build a sparse version of the output tensor first and then converting it to a dense tensor, but this is really slow. A fixed size TensorArray instead of the numpy array is also very slow. I am trying to make it faster.