4

The issue

I have a very simple function which uses numpy.where() for a very simple calculation.

  1. If the input is a scalar, the output is a numpy array of size ().
  2. If I multiply it by one, it becomes a numpy int32 of the same size.

My questions are:

what is the difference between 1 and 2?

  • Are they both scalars? Does such a thing as a numpy scalar even exist?
  • Why does multiplying it by one change the type? Is this a known, documented feature/bug?
  • why do other numpy functions, e.g. np.arange(5,6) return an array of size (1,) instead?

I doubt I am the first one to come across this but I haven't found much online.

I have found questions on the difference between an array of shape (n,) and one of shape (n,1), but that's a different matter.

Toy example:

import numpy as np

def my_find(a):
    return np.where(a == 0 , 1, 0)

out_scalar = my_find(5)
out_scalar_times_1 = 1 * out_scalar

print("a scalar input return an output of type:")
print(type(out_scalar))
print("and of shape")
print(out_scalar.shape)

print("")
print("Multiplying it by 1 returns a:")
print(type(out_scalar_times_1))

out_array = my_find(np.arange(0,5))

Spyder screenshot

enter image description here

1 Answer 1

3

Yes, there is such a thing as a numpy scalar

https://numpy.org/doc/stable/reference/arrays.scalars.html

A numpy array can have 0,1,2 or more dimensions. There's a lot of overlap between

np.int64(3)          # numpy int
np.array(3)          # 0d array
np.array([3])        # 1d array with 1 element
np.int(3)            # python int
3                    # python int

The first 3 have array attributes like shape and dtype. The differences between the first two are minor.

In a function like where, numpy first converts the arguments to array, e.g. np.array(5), np.array(1)

In [161]: np.where(5, 1, 0)
Out[161]: array(1)
In [162]: _.shape
Out[162]: ()
In [163]: np.array(5)
Out[163]: array(5)

But math like addition with a scalar may return a numpy scalar:

In [164]: np.array(5) + 1
Out[164]: 6
In [165]: type(_)
Out[165]: numpy.int64
In [166]: np.array(5) * 1
Out[166]: 5
In [167]: type(_)
Out[167]: numpy.int64

Indexing an array can also produce such a scalar:

In [182]: np.arange(3)[1]
Out[182]: 1
In [183]: type(_)
Out[183]: numpy.int64

where 'broadcasts' the arguments, so the resulting shape is, in the broadcasted sense, the "largest":

In [168]: np.where(np.arange(5),1,0)
Out[168]: array([0, 1, 1, 1, 1])
In [173]: np.where(5, [1],0)
Out[173]: array([1])
In [174]: np.where(0, [1],0)
Out[174]: array([0])
In [175]: np.where([[0]], [1],0)
Out[175]: array([[0]])

If spyder has tab completion like ipython, you can get a list of all the methods attached to an object. The methods for an np.int64(3) will look a lot like the those for np.array(3). But very different from 3.

There are also arrays with 0 elements - if one of the dimensions is 0

Out[184]: array([], dtype=int64)
In [185]: _.shape
Out[185]: (0,)
In [186]: np.arange(1)
Out[186]: array([0])
In [187]: _.shape
Out[187]: (1,)

Obviously a 0d can't have 0 elements, because it doesn't have any 0 dimensions.

Indexing a 0d array (or numpy scalar) is a bit tricker (but still logical):

In [189]: np.array(3)[()]      # 0 element indexing tuple
Out[189]: 3
In [190]: type(_)
Out[190]: numpy.int64
In [191]: np.array(3).item()
Out[191]: 3
In [192]: type(_)
Out[192]: int
In [193]: np.array(3)[()][()]
Out[193]: 3

The return of addition might be explained by 'array_priority'

dtype is not preserved in operations like this. Add a float to an int, and get a float.

In [203]: type(np.array(3, np.int16) + 3)
Out[203]: numpy.int64
In [204]: type(np.array(3, np.int16) + 3.0)
Out[204]: numpy.float64

ufunc casting

+ is actually a call to np.add ufunc. ufunc take key words like casting that give finer control over what results can be:

In [214]: np.add(np.array(3, np.int16), 3)
Out[214]: 6
In [215]: np.add(np.array(3, np.int16), 3, casting='no')
Traceback (most recent call last):
  File "<ipython-input-215-631cb3a3b303>", line 1, in <module>
    np.add(np.array(3, np.int16), 3, casting='no')
UFuncTypeError: Cannot cast ufunc 'add' input 0 from dtype('int16') to dtype('int64') with casting rule 'no'
    
In [217]: np.add(np.array(3, np.int16), 3, casting='safe')
Out[217]: 6

https://numpy.org/doc/stable/reference/ufuncs.html#output-type-determination

I was speculating that __array_priority__ played a role in returning a np.int64, but priorities go the wrong way.

In [194]: np.array(3).__array_priority__
Out[194]: 0.0
In [195]: np.int64(3).__array_priority__
Out[195]: -1000000.0
In [196]: np.array(3) + np.int64(3)
Out[196]: 6
In [197]: type(_)
Out[197]: numpy.int64

I don't know where it's documented, but often an operation will return a numpy scalar rather than a 0d array.

I just remembered/discovered one difference between 0d and numpy scalar - mutability

In [222]: x
Out[222]: array(3)
In [223]: x[...] = 4
In [224]: x
Out[224]: array(4)
In [225]: x = np.int64(3)
In [226]: x[...] = 4
Traceback (most recent call last):
  File "<ipython-input-226-f7dca2cc5565>", line 1, in <module>
    x[...] = 4
TypeError: 'numpy.int64' object does not support item assignment

Python classes can share a lot of behaviors/methods, but differ in others.

Sign up to request clarification or add additional context in comments.

2 Comments

Very insightful, thanks. However, I am still not clear on why multiplying it by 1 returns a scalar, effectively changing the data type. Nor why other numpy functions return an array of size (1,)
Size 1 or shape (1,)? np.int64(3), np.array(3), np.array([3])` all have size 1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.