6

I am trying to convert a multi-index pandas DataFrame into a numpy.ndarray. The DataFrame is below:

               s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

I would like the resulting numpy.ndarray to be the following with np.shape() = (2,2,4):

[[[ 0.0  0.0  0.8  0.2 ]
  [ 0.1  0.0  0.9  0.0 ]]

 [[ 0.0  0.0  0.9  0.1 ]
  [ 0.0  0.0  1.0  0.0]]]

I have tried df.as_matrix() but this returns:

 [[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]
  [ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]

How do I return a list of lists for the first level with each list representing an Action records.

2
  • 2
    Just reshape afterwards? Commented Sep 6, 2017 at 15:25
  • 1
    The shape in your result looks like (2, 2, 4). Commented Sep 6, 2017 at 20:08

5 Answers 5

5

You could use the following:

dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]]

 [[ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]]

The first line just finds the number of groups that you want to groupby.

Why this (or groupby) is needed: as soon as you use .values, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.

Sign up to request clarification or add additional context in comments.

1 Comment

Note, that instead of .values you now need to use .to_numpy, and that this method assumes you have all combinations of Action * State * State present in your dataframe.
1

One way

In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2], 
        [0.1, 0.0, 0.9, 0.0]],
       [[0.0, 0.0, 0.9, 0.1],
        [0.0, 0.0, 1.0, 0.0]]], dtype=object)

1 Comment

Unfortunately this array does not have the same dimensions that the intended array does: np.shape() of your result gives (2,) and the intended np.shape() is (2,3,3)
0

Using Divakar's suggestion, np.reshape() worked:

>>> print(P)

              s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

>>> np.reshape(P,(2,2,-1))

[[[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]]

 [[ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]]

>>> np.shape(P)

(2, 2, 4)

1 Comment

Thought you wanted a more generic solution ... whatever works!
0

Elaborating on Brad Solomon's answer, to get a sligthly more generic solution - indexes of different sizes and an unfixed number of indexes - one could do something like this:

def df_to_numpy(df):
    try:
        shape = [len(level) for level in df.index.levels]
    except AttributeError:
        shape = [len(df.index)]
    ncol = df.shape[-1]
    if ncol > 1:
        shape.append(ncol)
    return df.to_numpy().reshape(shape)

If df has missing sub-indexes reshape will not work. One way to add them would be (maybe there are better solutions):

def enforce_df_shape(df):
    try:
        ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
    except AttributeError:
        return df
    fulldf = pd.DataFrame(-1, columns=df.columns, index=ind)  # remove -1 to fill fulldf with nan
    fulldf.update(df)
    return fulldf

Comments

0

If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the .index.levshape like this:

x = df.s1.to_numpy().reshape(df.index.levshape)

This will give you a (2,2) containing the value of s1.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.