Unexpected behavior when trying to normalize a column in numpy.array (version 1.17.4)

Question

So, I was trying to normalize (i.e. max = 1, min = value/max) a specific column within a numpy array. I hoped this piece of code would do the trick:

bar = np.arange(12).reshape(6,2)

bar
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

bar[:,1] = bar[:,1] / bar[:,1].max()
bar
array([[ 0,  0],
       [ 2,  0],
       [ 4,  0],
       [ 6,  0],
       [ 8,  0],
       [10,  1]])

works as expected if the type of each value is 'float'.

foo = np.array([[1.1,2.2],
               [3.3,4.4],
               [5.5,6.6]])
foo[:,1] = foo[:,1] / foo[:,1].max()

foo
array([[1.1       , 0.33333333],
       [3.3       , 0.66666667],
       [5.5       , 1.        ]])

I guess what I'm asking is where is this default 'int' I'm missing here? (I'm taking this as a 'learning opportunity')

Mercury · Accepted Answer · 2020-04-09 18:52:56Z

1

If you simply execute:

out = bar[:,1] / bar[:,1].max()
print(out)
>>> [0.09090909 0.27272727 0.45454545 0.63636364 0.81818182 1.        ]

It's working just fine, since out is a newly created float array made to store these float values. But np.arange(12) gives you an int array by default. bar[:,1] = bar[:,1] / bar[:,1].max() tries to store the float values inside the integer array, and all the values become integers and you get [0 0 0 0 0 1].

To set the array as a float by default:

bar = np.arange(12, dtype='float').reshape(6,2)

Alternatively, you can also use:

bar = np.arange(12).reshape(6,2).astype('float')

It isn't uncommon for us to need to change the data type of the array throughout the program, as you may not always need the dtype you define originally. So .astype() is actually pretty handy in all kinds of scenarios.

answered Apr 9, 2020 at 18:52

Mercury

4,1811 gold badge15 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Gal Sapir Over a year ago

Thank you for the quick and helpful reply. But, I am not supposed to a thank you comment alone ;-), so I'm adding a follow-up question: Is there an "efficient" or preferred way to go? assuming I have to iterate this over a large number of arrays. Between the lines, it seems that the last method you suggested if the one you would choose. Is that so?

Mercury Over a year ago

The first method: Creates array of type [dtype]. Second method: Creates array of type [int], then converts the int array to a float array. The first method is more efficient. That being said, the .astype function can convert your array's type whenever needed, not only during declaration. So when you're making a new array, use the first method (dtype='...') and when you need to change the array's type later in the code use '.astype('...')'

Bruno Mello · Accepted Answer · 2020-04-09 18:39:54Z

0

From np.arange documentation :

dtype : dtype
The type of the output array. If dtype is not given, infer the data type from the other input arguments.

Since you passed int values it will infer that the values in the array are int and so they won't change to float, you can do like this if you want:

bar = np.arange(12.0).reshape(6,2)

answered Apr 9, 2020 at 18:39

Bruno Mello

4,6781 gold badge16 silver badges46 bronze badges

1 Comment

FBruzzesi Over a year ago

As a remark: the usual trick to divide by a float won't work either, since it will not change the array dtype.

Collectives™ on Stack Overflow

Unexpected behavior when trying to normalize a column in numpy.array (version 1.17.4)

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related