0

So, I was trying to normalize (i.e. max = 1, min = value/max) a specific column within a numpy array. I hoped this piece of code would do the trick:

bar = np.arange(12).reshape(6,2)

bar
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

bar[:,1] = bar[:,1] / bar[:,1].max()
bar
array([[ 0,  0],
       [ 2,  0],
       [ 4,  0],
       [ 6,  0],
       [ 8,  0],
       [10,  1]])

works as expected if the type of each value is 'float'.

foo = np.array([[1.1,2.2],
               [3.3,4.4],
               [5.5,6.6]])
foo[:,1] = foo[:,1] / foo[:,1].max()

foo
array([[1.1       , 0.33333333],
       [3.3       , 0.66666667],
       [5.5       , 1.        ]])

I guess what I'm asking is where is this default 'int' I'm missing here? (I'm taking this as a 'learning opportunity')

2 Answers 2

1

If you simply execute:

out = bar[:,1] / bar[:,1].max()
print(out)
>>> [0.09090909 0.27272727 0.45454545 0.63636364 0.81818182 1.        ]

It's working just fine, since out is a newly created float array made to store these float values. But np.arange(12) gives you an int array by default. bar[:,1] = bar[:,1] / bar[:,1].max() tries to store the float values inside the integer array, and all the values become integers and you get [0 0 0 0 0 1].

To set the array as a float by default:

bar = np.arange(12, dtype='float').reshape(6,2)

Alternatively, you can also use:

bar = np.arange(12).reshape(6,2).astype('float')

It isn't uncommon for us to need to change the data type of the array throughout the program, as you may not always need the dtype you define originally. So .astype() is actually pretty handy in all kinds of scenarios.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the quick and helpful reply. But, I am not supposed to a thank you comment alone ;-), so I'm adding a follow-up question: Is there an "efficient" or preferred way to go? assuming I have to iterate this over a large number of arrays. Between the lines, it seems that the last method you suggested if the one you would choose. Is that so?
The first method: Creates array of type [dtype]. Second method: Creates array of type [int], then converts the int array to a float array. The first method is more efficient. That being said, the .astype function can convert your array's type whenever needed, not only during declaration. So when you're making a new array, use the first method (dtype='...') and when you need to change the array's type later in the code use '.astype('...')'
0

From np.arange documentation :

dtype : dtype
The type of the output array. If dtype is not given, infer the data type from the other input arguments.

Since you passed int values it will infer that the values in the array are int and so they won't change to float, you can do like this if you want:

bar = np.arange(12.0).reshape(6,2)

1 Comment

As a remark: the usual trick to divide by a float won't work either, since it will not change the array dtype.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.