1

I have two arrays. I would like to use one of them as a reference for the second one, how can I do it? I have the following array A:

A = np.array([[1.00, 0.0, 1.03, 1.18],
          [0.0, 1.58, 0.0, 7.59],
          [1.00, 1.22, 1.07, 1.03]])

In addition, I have the array B:

B = np.array([[1.00, 2.00, 27.00, 10.00],
          [3.00, 9.00, 6.00, 2.00],
          [2.00, 6.00, 4.00, 15.00]])

I need to identify the position/location ([i,j]) of all zeros in A by column (if you pass from array to dataframe-just to clarify my point), then go to B and perform a certain operation (sum, or any other math formula) in the same [i,j]. I dont know how to it with arrays.

What I did up to now: I could solve this building a new array (C) which have i-columns (viewed as a dataframe) from A and B, then deleted rows where the first column is zero and performed the operation (in a loop sequence). I know this is not the most efficient way to do it. I also tried changing array to dataframe (then applied loc), but I prefer to use array for data manipulation. Finally, I tried this but the following message pops up arrays used as indices must be of integer (or boolean) type

I would like to learn a new approach to my task. Thank you very much.

6
  • 1
    What is the certain operation? Also what do you mean by identify all zeros by column? Commented Dec 28, 2018 at 22:18
  • @DanielMesejo I edited the post to answer your questions. Thanks. Commented Dec 28, 2018 at 22:28
  • Could you add what will be the output for your A matrix? Commented Dec 28, 2018 at 22:30
  • Do you want to count the columns where all the elements are zeros? Commented Dec 28, 2018 at 22:36
  • @DanielMesejo more details in the post. Thanks for your comments (they help me to clarfify even more what I am looking for) Commented Dec 28, 2018 at 23:58

1 Answer 1

1

Solution: use a masked array

Considering the form of your A and B, the easiest way to do the calculations you want is via a masked array. First, you create a new masked array with the data from B, masked at all of the locations where A==0:

marr = np.ma.masked_array(B, A==0)
print(f'the masked array looks like\n{marr}\n')

Output:

the masked array looks like
[[1.0 -- 27.0 10.0]
 [-- 9.0 -- 2.0]
 [2.0 6.0 4.0 15.0]]

operate on all columns in masked array at once without a loop

You can then perform various operations (sum, mean, cumprod, etc) on all of the masked columns at once like so:

colsums = marr.sum(axis=0)
colmeans = marr.mean(axis=0)

print(f'sum of each masked column\n{colsums}\n')
print(f'the mean of each masked column\n{colmeans}\n')

Output:

sum of each masked column
[3.0 15.0 31.0 27.0]

the mean of each column
[1.5 7.5 15.5 9.0]

Note that the mean of the first column is calculated as (1.0 + 2.0)/2. The mean method ignores the masked element(s) completely, the same as in the OP's original row deletion approach.

loop over the masked columns

If you instead want to perform some calculation for which there is no built-in Numpy method like sum or mean, you can instead iterate over the masked columns and operate on each in turn like so:

colmeans = [col.mean() for col in marr.T]
print(f'the result of iterating over the masked columns and taking the mean of each\n{colmeans}\n')

Output:

the result of iterating over the masked columns and taking the mean of each
[1.5, 7.5, 15.5, 9.0]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.