2

I am working with a root file (array of arrays). When I load the array into python, I get an awkward array since this is an array of arrays of varying sizes. I would like to learn how to convert this to a numpy array of arrays of the same size, by populating empty elements with NaNs. How can I convert an awkward array of varying size to a numpy array?

2
  • 1
    can you show some example of your input and your desired output? Commented Nov 4, 2021 at 20:09
  • There are various ways of 'padding' arrays. But it could help if you give us a sense of the array shapes (and dtype). np.nan is a float. Padding a bunch of 1d arrays will be different from 2d or more. A search on '[numpy] padding' should turn up a lot of SO answers Commented Nov 4, 2021 at 20:58

2 Answers 2

1

Suppose that you have an array of variable-length lists a:

>>> import numpy as np
>>> import awkward as ak
>>> a = ak.Array([[0, 1, 2], [], [3, 4], [5], [6, 7, 8, 9]])
>>> a
<Array [[0, 1, 2], [], ... [5], [6, 7, 8, 9]] type='5 * var * int64'>

The function that makes all lists have the same size is ak.pad_none. But first, we need a size to pad it to. We can get the length of each list with ak.num and then take the np.max of that.

>>> ak.num(a)
<Array [3, 0, 2, 1, 4] type='5 * int64'>
>>> desired_length = np.max(ak.num(a))
>>> desired_length
4

Now we can pad it and convert that into a NumPy array (because it now has rectangular shape).

>>> ak.pad_none(a, desired_length)
<Array [[0, 1, 2, None], ... [6, 7, 8, 9]] type='5 * var * ?int64'>
>>> ak.to_numpy(ak.pad_none(a, desired_length))
masked_array(
  data=[[0, 1, 2, --],
        [--, --, --, --],
        [3, 4, --, --],
        [5, --, --, --],
        [6, 7, 8, 9]],
  mask=[[False, False, False,  True],
        [ True,  True,  True,  True],
        [False, False,  True,  True],
        [False,  True,  True,  True],
        [False, False, False, False]],
  fill_value=999999)

The missing values (None) are converted into a NumPy masked array. If you want a plain NumPy array, you can ak.fill_none to give them a replacement value.

>>> ak.to_numpy(ak.fill_none(ak.pad_none(a, desired_length), 999))
array([[  0,   1,   2, 999],
       [999, 999, 999, 999],
       [  3,   4, 999, 999],
       [  5, 999, 999, 999],
       [  6,   7,   8,   9]])
Sign up to request clarification or add additional context in comments.

Comments

0

You can use this code and implement it accordingly:

a = [1,2,3,4,5]
b = [1,2,3]
c = max(len(a),len(b))

for i in range(len(a),c):
   a.append(None)
for i in range(len(b),c):
   b.append(None)

Result would be as follows:

a = [1, 2, 3, 4, 5] 
b = [1, 2, 3, None, None]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.