Convert 3d pandas DataFrame to Numpy ndarray

Question

I've got a dataframe like

xs = pd.DataFrame({
    'batch1': {
        'timestep1': [1, 2, 3],
        'timestep2': [3, 2, 1]
    }
}).T

and I want to convert it into a numpy array of shape (batch,timestep,feature). For xs that should be (1,2,3).

The issue is panda only knows the 2D shape, so to_numpy produces a 2D shape.

xs.to_numpy().shape  # (1, 2)

Similarly, this prevents using np.reshape because numpy doesn't seem to see the innermost dimension as an array

xs.to_numpy().reshape((1,2,3))  # ValueError: cannot reshape array of size 2 into shape (1,2,3)

[Edit] Add context on how the dataframe arrived in this state.

The dataframe originally started as

xs = pd.DataFrame({
    ('batch1','timestep1'): {
            'feature1': 1,
            'feature2': 2,
            'feature3': 3
        },
    ('batch1', 'timestep2'): {
            'feature1': 3,
            'feature2': 2,
            'feature3': 1
        }
    }
).T

which I decomposed into the nested list/array using

xs.apply(pd.DataFrame.to_numpy, axis=1).unstack()

Have you looked at what to_numpy produces? (not just its shape) — hpaulj
– hpaulj, Commented Feb 4, 2021 at 16:18
Yep. It generally produces the correct shape, i.e. xs.to_numpy().shape # (1, 2) where if you check the innermost dimension you can see the correct length: xs.to_numpy()[0][0].shape # (3,). So I'm stuck trying to promote that innermost shape up one level, I think . — Justin
– Justin, Commented Feb 4, 2021 at 16:46

David M. · Accepted Answer · 2021-02-04 16:53:07Z

1

import pandas as pd

xs = pd.DataFrame({
    'batch1': {
        'timestep1': [1, 2, 3],
        'timestep2': [3, 2, 1]
    }
}).T

xs = pd.concat((xs.explode('timestep1').drop('timestep2', axis=1), xs.explode('timestep2').drop('timestep1', axis=1)), axis=1)
print(xs, '\n')

n = xs.to_numpy().reshape(1, 2, 3)
print(n)

Output:

       timestep1 timestep2
batch1         1         3
batch1         2         2
batch1         3         1 

[[[1 3 2]
  [2 3 1]]]

EDIT

Starting from your original data frame you can do:

xs = pd.DataFrame({
    ('batch1','timestep1'): {
            'feature1': 1,
            'feature2': 2,
            'feature3': 3
        },
    ('batch1', 'timestep2'): {
            'feature1': 3,
            'feature2': 2,
            'feature3': 1
        },
    ('batch2','timestep1'): {
            'feature1': 4,
            'feature2': 5,
            'feature3': 6
        },
    ('batch2', 'timestep2'): {
            'feature1': 7,
            'feature2': 8,
            'feature3': 9
        }
    }
).T


array = xs.to_numpy().reshape(2,2,3)
print(array)

Output:

[[[1 2 3]
  [3 2 1]]

 [[4 5 6]
  [7 8 9]]]

edited Feb 4, 2021 at 16:53

answered Feb 4, 2021 at 16:06

David M.

4,6383 gold badges22 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Justin Over a year ago

Could the explode/drop be avoided if the DataFrame started as a multiindex? i.e. (batch, timestep) = [feature]

David M. Over a year ago

Could you show how you would transform your data frame in such a MultiIndex?

Justin Over a year ago

Sure. Edited the question description.

David M. Over a year ago

See the Edit in the post.

Justin Over a year ago

Thanks! The issue was my original dataframe was jagged. One I leveled all the timesteps I was able to to_numpy().reshape as expected.

Collectives™ on Stack Overflow

Convert 3d pandas DataFrame to Numpy ndarray

1 Answer 1

EDIT

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

EDIT

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related