0

I'm trying to compare each row of a numpy array with the whole numpy array without using iteration.

>>> sample = np.array([[1,2,3],[4,5,6]])
>>> sample
array([[1, 2, 3],
       [4, 5, 6]])

First I reshape the 2D-array to a 3D-array:

>>> sample2=sample.reshape(sample.shape[0],1,sample.shape[1])

And then with the following line of code I can compare the rows:

>>> sample2 == sample
array([[[ True,  True,  True],
        [False, False, False]],

        [[False, False, False],
        [ True,  True,  True]]])

...which is the result that I'm looking for.

But this does not work with large numpy arrays:

>>> sample3 = np.random.randint(low= 0, high = 2, size = 30000000).reshape(30000,1000)
>>> sample4 = sample3.reshape(sample3.shape[0],1,sample3.shape[1])
>>> sample4 == sample3  
<ipython-input-229-e1d55c6bb1ca>:1: DeprecationWarning: elementwise
comparison failed; this will raise an error in the future.
False

How can I solve this?

6
  • Did you mean to do sample4 == sample3 instead of sample4 == sample? Commented Mar 19, 2022 at 17:17
  • @richardec Thank you for the edit and comment, yes I corrected the last line Commented Mar 19, 2022 at 17:19
  • Are you sure sample4 == sample3 raises an error? Commented Mar 19, 2022 at 17:20
  • Also, note that instead of sample2=sample.reshape(sample.shape[0],1,sample.shape[1]), you can just do sample2 = sample[:, None]. Same with sample4: sample4 = sample3[:, None] Commented Mar 19, 2022 at 17:21
  • yes, sample4 == sample3 raises an error Commented Mar 19, 2022 at 17:23

1 Answer 1

1

This may shed some light on your question. Here is my code sample, based on yours:

import numpy as np
n=30000000
ny = 1000
sample3 = np.random.randint(low= 0, high = 2, size = n).reshape(n // ny, ny)
sample4 = sample3.reshape(sample3.shape[0],1,sample3.shape[1])
print(sample3.shape, sample4.shape)
test = sample4 == sample3
print(test)
test = np.equal(sample4, sample3)
print(test)

Its output is:

(30000, 1000) (30000, 1, 1000)
C:\Users\XYZ\python\code_sample.py:7: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
  test = sample4 == sample3
False
Traceback (most recent call last):
  File "code_sample.py", line 9, in <module>
    test = np.equal(sample4, sample3)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 838. GiB for an array with shape (30000, 30000, 1000) and data type bool

Also, here are the docs for numpy.equal() which is presumably used by the == operator for numpy arrays. They sate:

Input arrays. If x1.shape != x2.shape, they must be broadcastable to a common shape (which becomes the shape of the output).

So it looks like equal() may be attempting to use a substantial amount of memory (838 GB in the example above). Perhaps == decides to fail and give the deprecation warning (rather than something more apt, such as an out-of-memory error) when it realizes there's not enough memory?

Also, if I reduce n from 30000000 to 3000000 and comment out the call to equal(), execution of the == statement takes 10 or 20 seconds before the following result is printed:

(3000, 1000) (3000, 1, 1000)
[[[ True  True  True ...  True  True  True]
  [False  True  True ...  True  True  True]
  [ True  True  True ...  True  True  True]
  ...
  [False  True  True ...  True False  True]
  [False False  True ...  True  True False]
  [ True False False ... False False False]]

 [[False  True  True ...  True  True  True]
  [ True  True  True ...  True  True  True]
  [False  True  True ...  True  True  True]
  ...
  [ True  True  True ...  True False  True]
  [ True False  True ...  True  True False]
  [False False False ... False False False]]

 [[ True  True  True ...  True  True  True]
  [False  True  True ...  True  True  True]
  [ True  True  True ...  True  True  True]
  ...
  [False  True  True ...  True False  True]
  [False False  True ...  True  True False]
  [ True False False ... False False False]]

 ...

 [[False  True  True ...  True False  True]
  [ True  True  True ...  True False  True]
  [False  True  True ...  True False  True]
  ...
  [ True  True  True ...  True  True  True]
  [ True False  True ...  True False False]
  [False False False ... False  True False]]

 [[False False  True ...  True  True False]
  [ True False  True ...  True  True False]
  [False False  True ...  True  True False]
  ...
  [ True False  True ...  True False False]
  [ True  True  True ...  True  True  True]
  [False  True False ... False False  True]]

 [[ True False False ... False False False]
  [False False False ... False False False]
  [ True False False ... False False False]
  ...
  [False False False ... False  True False]
  [False  True False ... False False  True]
  [ True  True  True ...  True  True  True]]

So it looks like the issue you've encountered is probably related to running out of memory.

Sign up to request clarification or add additional context in comments.

1 Comment

@Mahdi Baghbanzadeh Did this answer help with your question?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.