2

I have an array indexes that for each row contains the columns that should be filled. For example:

[array([[    2, 14098,  6824, 24207,  1215],
   [   51,  1277,  3197,  1052,  4076],......

And I have another array values containing the values that should be filled in those positions. For example:

array([[1, 7, 75, 82, 11],
       [11, 5, 8, 82, 811],...

This means that for row 0, column 2 should be filled with value '1', column 14098 should be filled with value '7'... for row 1, column 51 should be filled with value '11', column 1277 should be filled with value '5'...

And a third array, a = np.zeros((100000, 100000)) that is the array to be filled given the two previous arrays.

I am using right now a nested loop to do it but I'm pretty sure that there is a better way to do it:

for row_idx in range(indexes.shape[0]):
    for col_idx in range(indexes.shape[1]):
        column = indexes[row_idx][col_idx]
        a[row_idx][indexes[row_idx][col_idx]] = values[row_idx][col_idx]

How can I fill the array using python/numpy (fancy indexing, broadcasting...) style? What is the most memory-efficient way to do it since I have limited ram?

Thanks for your help in advance!

2
  • Does np.put_along_axis(a, indexes, values, 0) do what you want? Commented Apr 13, 2020 at 22:45
  • It worked! Thank you! Commented Apr 14, 2020 at 21:41

1 Answer 1

1

This can be done with np.put_along_axis

Put values into the destination array by matching 1d index and data slices.This iterates over matching 1d slices oriented along the specified axis in the index and data arrays, and uses the former to place values into the latter. These slices can be different lengths.

See this for an example, taken from here

In [50]: df
Out[50]: 
   datetime1  datetime2  datetime3  datetime4
1          5          6          5          5
2          7          2          3          5
3          4          2          3          2
4          6          4          4          7
5          7          3          8          9

In [51]: index_arr = np.array([3, 2, 0 ,1 ,2])

In [52]: replace_arr = np.array([14, 12, 23, 17 ,15])

In [53]: np.put_along_axis(df.to_numpy(),index_arr[:,None],replace_arr[:,None],axis=1)

In [54]: df
Out[54]: 
   datetime1  datetime2  datetime3  datetime4
1          5          6          5         14
2          7          2         12          5
3         23          2          3          2
4          6         17          4          7
5          7          3         15          9

As you can see for example, the value of df[0][3] was changed from 5 to 14 and applying this same logic will work fine for your problem.

Sign up to request clarification or add additional context in comments.

1 Comment

How can we do this immutably?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.