index numpy two arrays

Question

I have two arrays. I would like to use one of them as a reference for the second one, how can I do it? I have the following array A:

A = np.array([[1.00, 0.0, 1.03, 1.18],
          [0.0, 1.58, 0.0, 7.59],
          [1.00, 1.22, 1.07, 1.03]])

In addition, I have the array B:

B = np.array([[1.00, 2.00, 27.00, 10.00],
          [3.00, 9.00, 6.00, 2.00],
          [2.00, 6.00, 4.00, 15.00]])

I need to identify the position/location ([i,j]) of all zeros in A by column (if you pass from array to dataframe-just to clarify my point), then go to B and perform a certain operation (sum, or any other math formula) in the same [i,j]. I dont know how to it with arrays.

What I did up to now: I could solve this building a new array (C) which have i-columns (viewed as a dataframe) from A and B, then deleted rows where the first column is zero and performed the operation (in a loop sequence). I know this is not the most efficient way to do it. I also tried changing array to dataframe (then applied loc), but I prefer to use array for data manipulation. Finally, I tried this but the following message pops up arrays used as indices must be of integer (or boolean) type

I would like to learn a new approach to my task. Thank you very much.

What is the certain operation? Also what do you mean by identify all zeros by column? — Dani Mesejo
– Dani Mesejo, Commented Dec 28, 2018 at 22:18
@DanielMesejo I edited the post to answer your questions. Thanks. — Pithit
– Pithit, Commented Dec 28, 2018 at 22:28
Do you want to count the columns where all the elements are zeros? — Dani Mesejo
– Dani Mesejo, Commented Dec 28, 2018 at 22:36
@DanielMesejo more details in the post. Thanks for your comments (they help me to clarfify even more what I am looking for) — Pithit
– Pithit, Commented Dec 28, 2018 at 23:58

tel · Accepted Answer · 2018-12-28 23:25:33Z

Solution: use a masked array

Considering the form of your A and B, the easiest way to do the calculations you want is via a masked array. First, you create a new masked array with the data from B, masked at all of the locations where A==0:

marr = np.ma.masked_array(B, A==0)
print(f'the masked array looks like\n{marr}\n')

Output:

the masked array looks like
[[1.0 -- 27.0 10.0]
 [-- 9.0 -- 2.0]
 [2.0 6.0 4.0 15.0]]

operate on all columns in masked array at once without a loop

You can then perform various operations (sum, mean, cumprod, etc) on all of the masked columns at once like so:

colsums = marr.sum(axis=0)
colmeans = marr.mean(axis=0)

print(f'sum of each masked column\n{colsums}\n')
print(f'the mean of each masked column\n{colmeans}\n')

Output:

sum of each masked column
[3.0 15.0 31.0 27.0]

the mean of each column
[1.5 7.5 15.5 9.0]

Note that the mean of the first column is calculated as (1.0 + 2.0)/2. The mean method ignores the masked element(s) completely, the same as in the OP's original row deletion approach.

loop over the masked columns

If you instead want to perform some calculation for which there is no built-in Numpy method like sum or mean, you can instead iterate over the masked columns and operate on each in turn like so:

colmeans = [col.mean() for col in marr.T]
print(f'the result of iterating over the masked columns and taking the mean of each\n{colmeans}\n')

Output:

the result of iterating over the masked columns and taking the mean of each
[1.5, 7.5, 15.5, 9.0]

Collectives™ on Stack Overflow

index numpy two arrays

1 Answer 1

Solution: use a masked array

operate on all columns in masked array at once without a loop

loop over the masked columns

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Solution: use a masked array

operate on all columns in masked array at once without a loop

loop over the masked columns

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related