How to normalize a numpy array to a unit vector

Question

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:

def normalize(v):
    norm = np.linalg.norm(v)
    if norm == 0: 
       return v
    return v / norm

This function handles the situation where vector v has the norm value of 0.

Is there any similar functions provided in sklearn or numpy?

If this is really a concern, you should check for norm < epsilon, where epsilon is a small tolerance. In addition, I wouldn't silently pass back a norm zero vector, I would raise an exception! — Hooked
– Hooked, Commented Jan 9, 2014 at 20:51
my function works but I would like to know if there is something inside the python's more common library. I am writing different machine learning functions and I would like to avoid to define too much new functions to make the code more clear and readable — Donbeo
– Donbeo, Commented Jan 9, 2014 at 21:08
I did a few quick tests and I found that x/np.linalg.norm(x) was not much slower (about 15-20%) than x/np.sqrt((x**2).sum()) in numpy 1.15.1 on a CPU. — Bill
– Bill, Commented Sep 10, 2018 at 19:10

ali_m · Accepted Answer · 2014-01-09 21:27:58Z

250

If you're using scikit-learn you can use sklearn.preprocessing.normalize:

import numpy as np
from sklearn.preprocessing import normalize

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True

edited Jan 9, 2014 at 21:27

answered Jan 9, 2014 at 21:15

ali_m

74.6k28 gold badges230 silver badges314 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Donbeo Over a year ago

Thanks for the answer but are you sure that sklearn.preprocessing.normalize works also with vector of shape=(n,) or (n,1) ? I am having some problems with this library

ali_m Over a year ago

normalize requires a 2D input. You can pass the axis= argument to specify whether you want to apply the normalization across the rows or columns of your input array.

Ash Over a year ago

Note that the 'norm' argument of the normalize function can be either 'l1' or 'l2' and the default is 'l2'. If you want your vector's sum to be 1 (e.g. a probability distribution) you should use norm='l1' in the normalize function.

Omid Over a year ago

Also note that np.linalg.norm(x) calculates 'l2' norm by default. If you want your vector's sum to be 1 you should use np.linalg.norm(x, ord=1)

Ramin Melikov Over a year ago

Note: x must be ndarray for it to work with the normalize() function. Otherwise it can be a list.

|

Sandell0 · Accepted Answer · 2021-11-04 19:09:33Z

70

I agree that it would be nice if such a function were part of the included libraries. But it isn't, as far as I know. So here is a version for arbitrary axes that gives optimal performance.

import numpy as np

def normalized(a, axis=-1, order=2):
    l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
    l2[l2==0] = 1
    return a / np.expand_dims(l2, axis)

A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))

print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))

edited Nov 4, 2021 at 19:09

Sandell0

1121 gold badge3 silver badges8 bronze badges

answered Jan 9, 2014 at 21:59

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

14 Comments

Eelco Hoogendoorn Over a year ago

I don't know; but it works over arbitrary axes, and we have explicit control over what happens for length 0 vectors.

Neil G Over a year ago

Very nice! This should be in numpy — although order should probably come before axis in my opinion.

Eelco Hoogendoorn Over a year ago

Because the Euclidian/pythagoran norm happens to be the most frequently used one; wouldn't you agree?

bendl Over a year ago

Pretty late, but I think it's worth mentioning that this is exactly why it is discouraged to use lowercase 'L' as a variable name... in my typeface 'l2' is indistinguishable from '12'

anon01 Over a year ago

@bendl I think that's exactly why it's encouraged to use a better typeface

|

mrk · Accepted Answer · 2022-10-28 06:57:17Z

54

This might also work for you

import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))

but fails when v has length 0.

In that case, introducing a small constant to prevent the zero division solves this.

As proposed in the comments one could also use

v/np.linalg.norm(v)

edited Oct 28, 2022 at 6:57

answered Jul 25, 2018 at 7:17

mrk

10.3k3 gold badges64 silver badges88 bronze badges

2 Comments

testing_22 Over a year ago

Or v/np.linalg.norm(v)

mrk Over a year ago

added your suggestion to the answer. Thanks for the contribution @testing_22

Eduard Feicho · Accepted Answer · 2022-05-02 15:44:50Z

29

To avoid zero division I use eps, but that's maybe not great.

def normalize(v):
    norm=np.linalg.norm(v)
    if norm==0:
        norm=np.finfo(v.dtype).eps
    return v/norm

edited May 2, 2022 at 15:44

answered Nov 1, 2016 at 12:49

Eduard Feicho

5684 silver badges8 bronze badges

4 Comments

pasbi Over a year ago

normalizing [inf, 1, 2] yields [nan, 0, 0], but shouldn't it be [1, 0, 0]?

Alessandro Muzzi Over a year ago

Some time has passed but the answer is no, [nan, 0, 0] is correct since the norm is inf and inf/inf is an indeterminate form because <everything>/inf is 0 but is also true that inf/<everything> is inf, so inf/inf cannot be determined.

NerdOnTour Over a year ago

Is there a reason for you to use the L1-norm? The OP seems to ask for L2-normalization.

Eduard Feicho Over a year ago

hm yeah should have been l2 norm

sergio verduzco · Accepted Answer · 2019-05-24 01:02:32Z

13

If you don't need utmost precision, your function can be reduced to:

v_norm = v / (np.linalg.norm(v) + 1e-16)

answered May 24, 2019 at 1:02

sergio verduzco

3612 silver badges6 bronze badges

Comments

Jaden Travnik · Accepted Answer · 2019-09-03 17:02:20Z

12

If you have multidimensional data and want each axis normalized to its max or its sum:

def normalize(_d, to_sum=True, copy=True):
    # d is a (n x dimension) np array
    d = _d if not copy else np.copy(_d)
    d -= np.min(d, axis=0)
    d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
    return d

Uses numpys peak to peak function.

a = np.random.random((5, 3))

b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1

c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1

edited Sep 3, 2019 at 17:02

answered May 8, 2018 at 22:46

Jaden Travnik

1,17714 silver badges27 bronze badges

1 Comment

Mcmil Over a year ago

Watch out if all values are the same in the original matrix, then ptp would be 0. Division by 0 will return nan.

Kraigolas · Accepted Answer · 2021-12-09 23:54:49Z

11

There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:

import transformations as trafo
import numpy as np

data = np.array([[1.0, 1.0, 0.0],
                 [1.0, 1.0, 1.0],
                 [1.0, 2.0, 3.0]])

print(trafo.unit_vector(data, axis=1))

edited Dec 9, 2021 at 23:54

Kraigolas

5,6403 gold badges15 silver badges40 bronze badges

answered Apr 17, 2018 at 8:39

Joe

7,2433 gold badges31 silver badges58 bronze badges

Comments

J.Hirsch · Accepted Answer · 2020-01-30 15:55:30Z

9

You mentioned sci-kit learn, so I want to share another solution.

sci-kit learn `MinMaxScaler`

In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.

It also deal with NaN issues for us.

NaNs are treated as missing values: disregarded in fit, and maintained in transform. ... see reference [1]

Code sample

The code is simple, just type

# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)

Reference

[1] sklearn.preprocessing.MinMaxScaler

edited Jan 30, 2020 at 15:55

J.Hirsch

1297 bronze badges

answered Mar 21, 2019 at 2:47

WY Hsu

1,9252 gold badges22 silver badges33 bronze badges

1 Comment

Ricardo Decal Over a year ago

This does a different type of transform. The OP wanted to scale the magnitude of the vector so that each vector has a length of 1; MinMaxScaler individually scales each column independently to be within a certain range.

Stanislav Tsepa · Accepted Answer · 2019-10-17 22:15:22Z

8

If you work with multidimensional array following fast solution is possible.

Say we have 2D array, which we want to normalize by last axis, while some rows have zero norm.

import numpy as np
arr = np.array([
    [1, 2, 3], 
    [0, 0, 0],
    [5, 6, 7]
], dtype=np.float)

lengths = np.linalg.norm(arr, axis=-1)
print(lengths)  # [ 3.74165739  0.         10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
print(arr)
# [[0.26726124 0.53452248 0.80178373]
# [0.         0.         0.        ]
# [0.47673129 0.57207755 0.66742381]]

answered Oct 17, 2019 at 22:15

Stanislav Tsepa

7209 silver badges13 bronze badges

Comments

max0r · Accepted Answer · 2023-11-19 06:41:32Z

8

If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:

import numpy as np
from torch import from_numpy
from torch.nn.functional import normalize

vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(from_numpy(vecs), dim=0, eps=1e-16).numpy()

edited Nov 19, 2023 at 6:41

answered Aug 31, 2018 at 7:01

max0r

3712 silver badges7 bronze badges

Comments

paulmelnikow · Accepted Answer · 2019-01-31 21:27:18Z

7

If you're working with 3D vectors, you can do this concisely using the toolbelt vg. It's a light layer on top of numpy and it supports single values and stacked vectors.

import numpy as np
import vg

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True

I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.

edited Jan 31, 2019 at 21:27

answered Jan 31, 2019 at 18:39

paulmelnikow

17.2k8 gold badges66 silver badges121 bronze badges

Comments

seralouk · Accepted Answer · 2019-06-26 20:21:09Z

7

Without sklearn and using just numpy. Just define a function:.

Assuming that the rows are the variables and the columns the samples (axis= 1):

import numpy as np

# Example array
X = np.array([[1,2,3],[4,5,6]])

def stdmtx(X):
    means = X.mean(axis =1)
    stds = X.std(axis= 1, ddof=1)
    X= X - means[:, np.newaxis]
    X= X / stds[:, np.newaxis]
    return np.nan_to_num(X)

output:

X
array([[1, 2, 3],
       [4, 5, 6]])

stdmtx(X)
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.]])

edited Jun 26, 2019 at 20:21

answered Jun 26, 2019 at 17:27

seralouk

33.6k10 gold badges127 silver badges141 bronze badges

1 Comment

Ricardo Decal Over a year ago

These output arrays do not have unit norm. Subtracting the mean and giving the samples unit variance does not produce unit vectors.

Cristian Arteaga · Accepted Answer · 2022-04-04 17:55:34Z

6

For a 2D array, you can use the following one-liner to normalize across rows. To normalize across columns, simply set axis=0.

a / np.linalg.norm(a, axis=1, keepdims=True)

answered Apr 4, 2022 at 17:55

Cristian Arteaga

5509 silver badges13 bronze badges

1 Comment

Maksym Ganenko Over a year ago

Thanks for mentioning keepdims=True, that's truly useful for shape-invariant case

sergzach · Accepted Answer · 2021-12-06 22:13:59Z

1

If you want all values in [0; 1] for 1d-array then just use

(a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))

Where a is your 1d-array.

An example:

>>> a = np.array([0, 1, 2, 4, 5, 2])
>>> (a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
array([0. , 0.2, 0.4, 0.8, 1. , 0.4])

Note for the method. For saving proportions between values there is a restriction: 1d-array must have at least one 0 and consists of 0 and positive numbers.

edited Dec 6, 2021 at 22:13

answered Dec 6, 2021 at 22:08

sergzach

6,8047 gold badges52 silver badges88 bronze badges

Comments

Ka Wa Yip · Accepted Answer · 2022-02-11 08:20:22Z

1

A simple dot product would do the job. No need for any extra package.

x = x/np.sqrt(x.dot(x))

By the way, if the norm of x is zero, it is inherently a zero vector, and cannot be converted to a unit vector (which has norm 1). If you want to catch the case of np.array([0,0,...0]), then use

norm = np.sqrt(x.dot(x))
x = x/norm if norm != 0 else x

answered Feb 11, 2022 at 8:20

Ka Wa Yip

3,0594 gold badges27 silver badges39 bronze badges

1 Comment

user111950 Over a year ago

I often use this trick: x_normalised = x / (norm+(norm==0)) so in all cases where the norm is zero, you just divide by one.

Oliver Jennrich · Accepted Answer · 2023-09-23 11:11:25Z

Unfortunately the simple solution x/numpy.linalg.norm(x) doesn't work if x is an array of vectors. But with a simple reshape() you can force it into a flat list, use a list comprehension, and use reshape() again to get back the original shape.

s=x.shape
np.array([ v/np.linalg.norm(v)  for v in x.reshape(-1, s[-1])]).reshape(s)

First we store the shape of the array

s=x.shape

Then we reshape it into a simple (one-dimensional) array of vectors

x.reshape(-1, s[-1])

by making use of the '-1' argument of reshape() which essentially means "take as many as it needs", e.g . if x was a (4,5,3) array, x.reshape(-1,3) would be of shape (20,3). The use of s[-1] allows for an arbitrary dimension of the vectors.

Then we use a list comprehension to step through the array and calculate the unit vector one vector at a time

[ v/np.linalg.norm(v)  for v in x.reshape(-1, s[-1])]

and finally we turn it back into an numpy array and give it back its original shape

np.array([ v/np.linalg.norm(v)  for v in x.reshape(-1, s[-1])]).reshape(s)

Collectives™ on Stack Overflow

How to normalize a numpy array to a unit vector

16 Answers 16

7 Comments

14 Comments

2 Comments

4 Comments

Comments

1 Comment

Comments

sci-kit learn `MinMaxScaler`

Code sample

1 Comment

Comments

Comments

Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

7 Comments

14 Comments

2 Comments

4 Comments

Comments

1 Comment

Comments

sci-kit learn MinMaxScaler

Code sample

1 Comment

Comments

Comments

Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Linked

Related

sci-kit learn `MinMaxScaler`