1

I have a DataFrame that looks like this:

A    B    C
1    2    3
1    5    3
4    8    2
4    2    1

I would like to create a NumPy array from this data using column A as the index, column B as the column headers and column C as the fill data.

Ultimately, it should look like this:

     2    5    8
1    3    3    
4    1         2

Is there a good way to do this?

I have tried df.pivot_table, but I'm worried I have messed up the data, and I would rather do it in another, more intuitive way.

2
  • 2
    No, you can't have empty cells in an array. Why not fill those empty cells/spaces with some invalid specifier like 0s or NaNs or something else? Commented Nov 15, 2016 at 19:30
  • Yup, filling with zeros would work great. I was just going to apply df.fillna(0) Commented Nov 15, 2016 at 19:58

3 Answers 3

4

manipulate the dataframe like this

df.set_index(['A', 'B']).C.unstack()

enter image description here

Or

df.set_index(['A', 'B']).C.unstack(fill_value='')

enter image description here


get the numpy array like this

df.set_index(['A', 'B']).C.unstack().values

array([[  3.,   3.,  nan],
       [  1.,  nan,   2.]])

Or

df.set_index(['A', 'B']).C.unstack(fill_value='').values

array([[3, 3, ''],
       [1, '', 2]], dtype=object)
Sign up to request clarification or add additional context in comments.

1 Comment

Or, pandas.pivot_table
1

Pandas unstack looked nice! So, I thought let's try to replicate the same behavior with NumPy that could work on arrays and ended up something like this -

def numpy_unstack(a, fillval=0):
    r = np.unique(a[:,0],return_inverse=1)[1]
    c = np.unique(a[:,1],return_inverse=1)[1]
    out = np.full((r.max()+1,c.max()+1),fillval)
    out[r,c] = a[:,2]
    return out

Sample run -

In [81]: df
Out[81]: 
   0  1  2
0  1  2  3
1  1  5  3
2  4  8  2
3  4  2  1

In [82]: numpy_unstack(df.values,0)
Out[82]: 
array([[ 3.,  3.,  0.],
       [ 1.,  0.,  2.]])

In [83]: numpy_unstack(df.values,np.nan)
Out[83]: 
array([[  3.,   3.,  nan],
       [  1.,  nan,   2.]])

Comments

1

Like mentioned above, you can use pd.pivot_table like

In [1655]: df.pivot_table(index='A', columns='B', values='C', fill_value='')
Out[1655]:
B  2  5  8
A
1  3  3
4  1     2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.