0

I'm trying to load into my pipeline several files, each file contains 3 signals and the 3 signals are ordered in 10 minutes intervals. When i load the first file it has this shape (86, 75000,3). I'm using tensorflow 1.14

I have tried the following code, to make the code usable by you i simulate the loading with zeros:

import numpy as np
import tensorflow as tf


def my_func(x):
    p = np.zeros([86, 75000, 3])
    return p

def load_sign(path):
    sign = tf.compat.v1.numpy_function(my_func, [path], tf.float64)
    return sign

s = [1, 2]  # list with filenames, these are paths, here i simulate with numbers

AUTOTUNE = tf.data.experimental.AUTOTUNE  
ds = tf.data.Dataset.from_tensor_slices(s)
ds = ds.map(load_sign, num_parallel_calls=AUTOTUNE)

itera = tf.data.make_one_shot_iterator(ds)
x = itera.get_next()

with tf.Session() as sess:
    # sess.run(itera.initializer)
    va_sign = sess.run([x])
    va = np.array(va_sign)
    print(va.shape)

I get this shape: (1, 86, 75000, 3) While i would like to obtain 3 different variables each with this shape: (,75000)

How can i do it? I have also tried this code, but i get an error

import numpy as np
import tensorflow as tf


def my_func(x):
    p = np.zeros([86, 75000, 3])
    x = p[:,:,0]
    y = p[:, :, 1]
    z = p[:, :, 2]
    return x, y, z

# load the signals, in my example it creates the signals using zeros
def load_sign(path):
    a, b, c = tf.compat.v1.numpy_function(my_func, [path], tf.float64)
    return tf.data.Dataset.zip((a,b,c))

s = [1, 2]  # list with filenames, these are paths, here i simulate with numbers

AUTOTUNE = tf.data.experimental.AUTOTUNE  
ds = tf.data.Dataset.from_tensor_slices(s)
ds = ds.map(load_sign, num_parallel_calls=AUTOTUNE)

itera = tf.data.make_one_shot_iterator(ds)
x, y, z = itera.get_next()

with tf.Session() as sess:
    # sess.run(itera.initializer)
    va_sign = sess.run([x])
    va = np.array(va_sign)
    print(va.shape)

here i would expect a that x has this shape: (86, 75000), but instead i get this error. How can i make it work? And even better i can obtain an x with this shape (,75000)

TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn.

0

1 Answer 1

1

The numpy_function:
a, b, c = tf.compat.v1.numpy_function(my_func, [path], tf.float64) should return a python function that can be used inside graph environment. The variables themselves are returned by my_func. So the following code should look like this:

def my_func(x):
    p = np.zeros([86, 75000, 3])
    x = p[:,:,0]
    y = p[:, :, 1]
    z = p[:, :, 2]
    return x, y, z

def load_sign(path):
    func = tf.compat.v1.numpy_function(my_func, [path], [tf.float64, tf.float64, tf.float64])
    return func

The rest is pretty much the same with minor tweaks:

s = [1, 2]  

AUTOTUNE = tf.data.experimental.AUTOTUNE  
ds = tf.data.Dataset.from_tensor_slices(s)
ds = ds.map(load_sign, num_parallel_calls=AUTOTUNE)

itera = tf.data.make_one_shot_iterator(ds)
output = itera.get_next() # Returns tuple of 3: x,y,z from my_func

with tf.Session() as sess:
    va_sign = sess.run([output])[0] # Unnest single-element list
    for entry in va_sign:
      print(entry.shape)

This will yield 3 elements, each of shape (86, 75000).

To further preprocess your data and reach (75000,) you could make use of tf.data.Dataset.unbatch():

AUTOTUNE = tf.data.experimental.AUTOTUNE  
ds = tf.data.Dataset.from_tensor_slices(s)
ds = ds.map(load_sign, num_parallel_calls=AUTOTUNE).unbatch()

itera = tf.data.make_one_shot_iterator(ds)
output = itera.get_next() # Returns tuple of 3: x,y,z from my_func

The same iteration, as above, will give you now three elements of size (75000,).

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much! I think unbatch doesn't exist in tf 1.14, but i have tried your solution on colab with tf 1.15 and it works fine, so i'll just update my tf version. Just one question, when i try to count the number of elements in my dataset i would expect 86 elements, each with 75000 samples, while i get 172. In fact when the dataset should be out of range it restarts like i used repeat(2). Why does this happen?
Ah yes I tested it on Colab with tf 1.15 so that's why. It probably has 172 elements because in the example you provided s = [1, 2]. The dataset has two elements, therefore .map is executed two times. Then, the dataset is unbatched and you are left with 86*2=172
ops didn't notice i used the full s, yesterday in my tests i was using s[0]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.