0

I've got a dataframe like

xs = pd.DataFrame({
    'batch1': {
        'timestep1': [1, 2, 3],
        'timestep2': [3, 2, 1]
    }
}).T

DataFrame where each cell is a list

and I want to convert it into a numpy array of shape (batch,timestep,feature). For xs that should be (1,2,3).

The issue is panda only knows the 2D shape, so to_numpy produces a 2D shape.

xs.to_numpy().shape  # (1, 2)

Similarly, this prevents using np.reshape because numpy doesn't seem to see the innermost dimension as an array

xs.to_numpy().reshape((1,2,3))  # ValueError: cannot reshape array of size 2 into shape (1,2,3)

[Edit] Add context on how the dataframe arrived in this state.

The dataframe originally started as

xs = pd.DataFrame({
    ('batch1','timestep1'): {
            'feature1': 1,
            'feature2': 2,
            'feature3': 3
        },
    ('batch1', 'timestep2'): {
            'feature1': 3,
            'feature2': 2,
            'feature3': 1
        }
    }
).T

MultiIndex dataframe

which I decomposed into the nested list/array using

xs.apply(pd.DataFrame.to_numpy, axis=1).unstack()

Unstacked dataframe

2
  • Have you looked at what to_numpy produces? (not just its shape) Commented Feb 4, 2021 at 16:18
  • Yep. It generally produces the correct shape, i.e. xs.to_numpy().shape # (1, 2) where if you check the innermost dimension you can see the correct length: xs.to_numpy()[0][0].shape # (3,). So I'm stuck trying to promote that innermost shape up one level, I think . Commented Feb 4, 2021 at 16:46

1 Answer 1

1
import pandas as pd

xs = pd.DataFrame({
    'batch1': {
        'timestep1': [1, 2, 3],
        'timestep2': [3, 2, 1]
    }
}).T

xs = pd.concat((xs.explode('timestep1').drop('timestep2', axis=1), xs.explode('timestep2').drop('timestep1', axis=1)), axis=1)
print(xs, '\n')

n = xs.to_numpy().reshape(1, 2, 3)
print(n)

Output:

       timestep1 timestep2
batch1         1         3
batch1         2         2
batch1         3         1 

[[[1 3 2]
  [2 3 1]]]

EDIT

Starting from your original data frame you can do:

xs = pd.DataFrame({
    ('batch1','timestep1'): {
            'feature1': 1,
            'feature2': 2,
            'feature3': 3
        },
    ('batch1', 'timestep2'): {
            'feature1': 3,
            'feature2': 2,
            'feature3': 1
        },
    ('batch2','timestep1'): {
            'feature1': 4,
            'feature2': 5,
            'feature3': 6
        },
    ('batch2', 'timestep2'): {
            'feature1': 7,
            'feature2': 8,
            'feature3': 9
        }
    }
).T


array = xs.to_numpy().reshape(2,2,3)
print(array)

Output:

[[[1 2 3]
  [3 2 1]]

 [[4 5 6]
  [7 8 9]]]
Sign up to request clarification or add additional context in comments.

5 Comments

Could the explode/drop be avoided if the DataFrame started as a multiindex? i.e. (batch, timestep) = [feature]
Could you show how you would transform your data frame in such a MultiIndex?
Sure. Edited the question description.
See the Edit in the post.
Thanks! The issue was my original dataframe was jagged. One I leveled all the timesteps I was able to to_numpy().reshape as expected.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.