Create a numpy array from columns of a pandas dataframe

Question

I have a DataFrame that looks like this:

A    B    C
1    2    3
1    5    3
4    8    2
4    2    1

I would like to create a NumPy array from this data using column A as the index, column B as the column headers and column C as the fill data.

Ultimately, it should look like this:

     2    5    8
1    3    3    
4    1         2

Is there a good way to do this?

I have tried df.pivot_table, but I'm worried I have messed up the data, and I would rather do it in another, more intuitive way.

No, you can't have empty cells in an array. Why not fill those empty cells/spaces with some invalid specifier like 0s or NaNs or something else? — Divakar
– Divakar, Commented Nov 15, 2016 at 19:30
Yup, filling with zeros would work great. I was just going to apply df.fillna(0) — Nate
– Nate, Commented Nov 15, 2016 at 19:58

piRSquared · Accepted Answer · 2016-11-15 19:37:44Z

4

manipulate the dataframe like this

df.set_index(['A', 'B']).C.unstack()

Or

df.set_index(['A', 'B']).C.unstack(fill_value='')

get the numpy array like this

df.set_index(['A', 'B']).C.unstack().values

array([[  3.,   3.,  nan],
       [  1.,  nan,   2.]])

Or

df.set_index(['A', 'B']).C.unstack(fill_value='').values

array([[3, 3, ''],
       [1, '', 2]], dtype=object)

answered Nov 15, 2016 at 19:37

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kartik Over a year ago

Or, pandas.pivot_table

Divakar · Accepted Answer · 2016-11-15 19:55:16Z

1

Pandas unstack looked nice! So, I thought let's try to replicate the same behavior with NumPy that could work on arrays and ended up something like this -

def numpy_unstack(a, fillval=0):
    r = np.unique(a[:,0],return_inverse=1)[1]
    c = np.unique(a[:,1],return_inverse=1)[1]
    out = np.full((r.max()+1,c.max()+1),fillval)
    out[r,c] = a[:,2]
    return out

Sample run -

In [81]: df
Out[81]: 
   0  1  2
0  1  2  3
1  1  5  3
2  4  8  2
3  4  2  1

In [82]: numpy_unstack(df.values,0)
Out[82]: 
array([[ 3.,  3.,  0.],
       [ 1.,  0.,  2.]])

In [83]: numpy_unstack(df.values,np.nan)
Out[83]: 
array([[  3.,   3.,  nan],
       [  1.,  nan,   2.]])

answered Nov 15, 2016 at 19:55

Divakar

222k19 gold badges273 silver badges374 bronze badges

Comments

Zero · Accepted Answer · 2017-08-09 20:06:49Z

1

Like mentioned above, you can use pd.pivot_table like

In [1655]: df.pivot_table(index='A', columns='B', values='C', fill_value='')
Out[1655]:
B  2  5  8
A
1  3  3
4  1     2

answered Aug 9, 2017 at 20:06

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Collectives™ on Stack Overflow

Create a numpy array from columns of a pandas dataframe

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related