Normalize numpy array columns in python

Question

I have a numpy array where each cell of a specific row represents a value for a feature. I store all of them in an 100*4 matrix.

A     B   C
1000  10  0.5
765   5   0.35
800   7   0.09

Any idea how I can normalize rows of this numpy.array where each value is between 0 and 1?

My desired output is:

A     B    C
1     1    1
0.765 0.5  0.7
0.8   0.7  0.18(which is 0.09/0.5)

Just to be clear: is it a NumPy array or a Pandas DataFrame? — Alex Riley
– Alex Riley, Commented Apr 15, 2015 at 21:52
When programming it's important to be specific: a set is a particular object in Python, and you can't have a set of numpy arrays. Python doesn't have a matrix, but numpy does, and that matrix type isn't the same as a numpy array/ndarray (which is itself different from Python's array type, which is not the same as a list). And none of these are pandas DataFrames.. — DSM
– DSM, Commented Apr 15, 2015 at 21:58
I do not think this is a complete normalization. I would look at stackoverflow.com/questions/9775765/… for a better definition of normalization. — 1man
– 1man, Commented Jan 15, 2017 at 1:36

ali_m · Accepted Answer · 2016-01-29 22:51:48Z

123

If I understand correctly, what you want to do is divide by the maximum value in each column. You can do this easily using broadcasting.

Starting with your example array:

import numpy as np

x = np.array([[1000,  10,   0.5],
              [ 765,   5,  0.35],
              [ 800,   7,  0.09]])

x_normed = x / x.max(axis=0)

print(x_normed)
# [[ 1.     1.     1.   ]
#  [ 0.765  0.5    0.7  ]
#  [ 0.8    0.7    0.18 ]]

x.max(0) takes the maximum over the 0th dimension (i.e. rows). This gives you a vector of size (ncols,) containing the maximum value in each column. You can then divide x by this vector in order to normalize your values such that the maximum value in each column will be scaled to 1.

If x contains negative values you would need to subtract the minimum first:

x_normed = (x - x.min(0)) / x.ptp(0)

Here, x.ptp(0) returns the "peak-to-peak" (i.e. the range, max - min) along axis 0. This normalization also guarantees that the minimum value in each column will be 0.

edited Jan 29, 2016 at 22:51

answered Apr 15, 2015 at 22:02

ali_m

74.6k28 gold badges230 silver badges314 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ahajib Over a year ago

I really appreciate your answer, I always have issues dealing with "axis" !

ali_m Over a year ago

For reductions (i.e. .max(), .min(), .sum(), .mean() etc.), you just need to remember that axis specifies the dimension that you want to "collapse" during the reduction. If you want the maximum for each column, then you need to collapse the the row dimension.

ali_m Over a year ago

@rawbeans See my update. The reason I divided by the maximum is because that's what the OP showed in their example.

1man Over a year ago

@ali_m, Would you please explain why you are saying "If x contains negative values"? If the minimum of the array is 100 and the maximum is 103, I think you should definitely use your second formula, otherwise your result will not have a 0 offset.

ali_m Over a year ago

@GalacticKetchup You can easily extend this to reductions over arbitrary axes by passing keepdims=True to the reduction ufunc. This arg prevents the reduction axis from getting "squeezed out" so that broadcasting will still work correctly, e.g. x / x.max(axis=1, keepdims=True).

|

Marcin Mrugas · Accepted Answer · 2017-05-30 08:45:20Z

32

You can use sklearn.preprocessing:

from sklearn.preprocessing import normalize
data = np.array([
    [1000, 10, 0.5],
    [765, 5, 0.35],
    [800, 7, 0.09], ])
data = normalize(data, axis=0, norm='max')
print(data)
>>[[ 1.     1.     1.   ]
[ 0.765  0.5    0.7  ]
[ 0.8    0.7    0.18 ]]

answered May 30, 2017 at 8:45

Marcin Mrugas

1,0649 silver badges18 bronze badges

1 Comment

Robur_131 Over a year ago

Any way to scale the column values between 1 and ``2`? Using MinMaxScaler?

Collectives™ on Stack Overflow

Normalize numpy array columns in python

2 Answers 2

7 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related