51

I have a pandas series features that has the following values (features.values)

array([array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]),
       array([0, 0, 0, ..., 0, 0, 0]), ...,
       array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]),
       array([0, 0, 0, ..., 0, 0, 0])], dtype=object)

Now I really want this to be recognized as matrix, but if I do

>>> features.values.shape
(10000,)

rather than (10000, 3000) which is what I would expect.

How can I get this to be recognized as 2d rather than a 1d array with arrays as values. Also why does it not automatically detect it as a 2d array?

6
  • possible duplicate: stackoverflow.com/questions/42920363/… Commented Jun 21, 2018 at 15:21
  • 6
    Try np.stack(features). It treats the array as a list of arrays, and concatenates them on a new axis. np.vstack(features) would also work in this case. That's assuming that all internal arrays have the same shape. Commented Jun 21, 2018 at 16:18
  • 1
    @anishtain4, your link is for a pandas dataframe, not a numpy array. Commented Jun 21, 2018 at 16:19
  • @hpaulj "I have a pandas series" Commented Jun 21, 2018 at 16:33
  • @hpaulj np.stack worked great. Just really dont understand why features.values doesn't return it as such, or why numpy doesnt recognize it as a 2d array. Thank you! Commented Jun 21, 2018 at 17:32

2 Answers 2

60

In response your comment question, let's compare 2 ways of creating an array

First make an array from a list of arrays (all same length):

In [302]: arr = np.array([np.arange(3), np.arange(1,4), np.arange(10,13)])
In [303]: arr
Out[303]: 
array([[ 0,  1,  2],
       [ 1,  2,  3],
       [10, 11, 12]])

The result is a 2d array of numbers.

If instead we make an object dtype array, and fill it with arrays:

In [304]: arr = np.empty(3,object)
In [305]: arr[:] = [np.arange(3), np.arange(1,4), np.arange(10,13)]
In [306]: arr
Out[306]: 
array([array([0, 1, 2]), array([1, 2, 3]), array([10, 11, 12])],
      dtype=object)

Notice that this display is like yours. This is, by design a 1d array. Like a list it contains pointers to arrays elsewhere in memory. Notice that it requires an extra construction step. The default behavior of np.array is to create a multidimensional array where it can.

It takes extra effort to get around that. Likewise it takes some extra effort to undo that - to create the 2d numeric array.

Simply calling np.array on it does not change the structure.

In [307]: np.array(arr)
Out[307]: 
array([array([0, 1, 2]), array([1, 2, 3]), array([10, 11, 12])],
      dtype=object)

stack does change it to 2d. stack treats it as a list of arrays, which it joins on a new axis.

In [308]: np.stack(arr)
Out[308]: 
array([[ 0,  1,  2],
       [ 1,  2,  3],
       [10, 11, 12]])
Sign up to request clarification or add additional context in comments.

Comments

16

Shortening @hpauli answer:

your_2d_arry = np.stack(arr_of_arr_object)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.