328

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:

def normalize(v):
    norm = np.linalg.norm(v)
    if norm == 0: 
       return v
    return v / norm

This function handles the situation where vector v has the norm value of 0.

Is there any similar functions provided in sklearn or numpy?

4
  • 17
    What's wrong with what you've written? Commented Jan 9, 2014 at 20:30
  • 7
    If this is really a concern, you should check for norm < epsilon, where epsilon is a small tolerance. In addition, I wouldn't silently pass back a norm zero vector, I would raise an exception! Commented Jan 9, 2014 at 20:51
  • 9
    my function works but I would like to know if there is something inside the python's more common library. I am writing different machine learning functions and I would like to avoid to define too much new functions to make the code more clear and readable Commented Jan 9, 2014 at 21:08
  • 3
    I did a few quick tests and I found that x/np.linalg.norm(x) was not much slower (about 15-20%) than x/np.sqrt((x**2).sum()) in numpy 1.15.1 on a CPU. Commented Sep 10, 2018 at 19:10

16 Answers 16

250

If you're using scikit-learn you can use sklearn.preprocessing.normalize:

import numpy as np
from sklearn.preprocessing import normalize

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the answer but are you sure that sklearn.preprocessing.normalize works also with vector of shape=(n,) or (n,1) ? I am having some problems with this library
normalize requires a 2D input. You can pass the axis= argument to specify whether you want to apply the normalization across the rows or columns of your input array.
Note that the 'norm' argument of the normalize function can be either 'l1' or 'l2' and the default is 'l2'. If you want your vector's sum to be 1 (e.g. a probability distribution) you should use norm='l1' in the normalize function.
Also note that np.linalg.norm(x) calculates 'l2' norm by default. If you want your vector's sum to be 1 you should use np.linalg.norm(x, ord=1)
Note: x must be ndarray for it to work with the normalize() function. Otherwise it can be a list.
|
70

I agree that it would be nice if such a function were part of the included libraries. But it isn't, as far as I know. So here is a version for arbitrary axes that gives optimal performance.

import numpy as np

def normalized(a, axis=-1, order=2):
    l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
    l2[l2==0] = 1
    return a / np.expand_dims(l2, axis)

A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))

print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))

14 Comments

I don't know; but it works over arbitrary axes, and we have explicit control over what happens for length 0 vectors.
Very nice! This should be in numpy — although order should probably come before axis in my opinion.
Because the Euclidian/pythagoran norm happens to be the most frequently used one; wouldn't you agree?
Pretty late, but I think it's worth mentioning that this is exactly why it is discouraged to use lowercase 'L' as a variable name... in my typeface 'l2' is indistinguishable from '12'
@bendl I think that's exactly why it's encouraged to use a better typeface
|
54

This might also work for you

import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))

but fails when v has length 0.

In that case, introducing a small constant to prevent the zero division solves this.

As proposed in the comments one could also use

v/np.linalg.norm(v)

2 Comments

Or v/np.linalg.norm(v)
added your suggestion to the answer. Thanks for the contribution @testing_22
29

To avoid zero division I use eps, but that's maybe not great.

def normalize(v):
    norm=np.linalg.norm(v)
    if norm==0:
        norm=np.finfo(v.dtype).eps
    return v/norm

4 Comments

normalizing [inf, 1, 2] yields [nan, 0, 0], but shouldn't it be [1, 0, 0]?
Some time has passed but the answer is no, [nan, 0, 0] is correct since the norm is inf and inf/inf is an indeterminate form because <everything>/inf is 0 but is also true that inf/<everything> is inf, so inf/inf cannot be determined.
Is there a reason for you to use the L1-norm? The OP seems to ask for L2-normalization.
hm yeah should have been l2 norm
13

If you don't need utmost precision, your function can be reduced to:

v_norm = v / (np.linalg.norm(v) + 1e-16)

Comments

12

If you have multidimensional data and want each axis normalized to its max or its sum:

def normalize(_d, to_sum=True, copy=True):
    # d is a (n x dimension) np array
    d = _d if not copy else np.copy(_d)
    d -= np.min(d, axis=0)
    d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
    return d

Uses numpys peak to peak function.

a = np.random.random((5, 3))

b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1

c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1

1 Comment

Watch out if all values are the same in the original matrix, then ptp would be 0. Division by 0 will return nan.
11

There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:

import transformations as trafo
import numpy as np

data = np.array([[1.0, 1.0, 0.0],
                 [1.0, 1.0, 1.0],
                 [1.0, 2.0, 3.0]])

print(trafo.unit_vector(data, axis=1))

Comments

9

You mentioned sci-kit learn, so I want to share another solution.

sci-kit learn MinMaxScaler

In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.

It also deal with NaN issues for us.

NaNs are treated as missing values: disregarded in fit, and maintained in transform. ... see reference [1]

Code sample

The code is simple, just type

# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)
Reference

1 Comment

This does a different type of transform. The OP wanted to scale the magnitude of the vector so that each vector has a length of 1; MinMaxScaler individually scales each column independently to be within a certain range.
8

If you work with multidimensional array following fast solution is possible.

Say we have 2D array, which we want to normalize by last axis, while some rows have zero norm.

import numpy as np
arr = np.array([
    [1, 2, 3], 
    [0, 0, 0],
    [5, 6, 7]
], dtype=np.float)

lengths = np.linalg.norm(arr, axis=-1)
print(lengths)  # [ 3.74165739  0.         10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
print(arr)
# [[0.26726124 0.53452248 0.80178373]
# [0.         0.         0.        ]
# [0.47673129 0.57207755 0.66742381]]

Comments

8

If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:

import numpy as np
from torch import from_numpy
from torch.nn.functional import normalize

vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(from_numpy(vecs), dim=0, eps=1e-16).numpy()

Comments

7

If you're working with 3D vectors, you can do this concisely using the toolbelt vg. It's a light layer on top of numpy and it supports single values and stacked vectors.

import numpy as np
import vg

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True

I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.

Comments

7

Without sklearn and using just numpy. Just define a function:.

Assuming that the rows are the variables and the columns the samples (axis= 1):

import numpy as np

# Example array
X = np.array([[1,2,3],[4,5,6]])

def stdmtx(X):
    means = X.mean(axis =1)
    stds = X.std(axis= 1, ddof=1)
    X= X - means[:, np.newaxis]
    X= X / stds[:, np.newaxis]
    return np.nan_to_num(X)

output:

X
array([[1, 2, 3],
       [4, 5, 6]])

stdmtx(X)
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.]])

1 Comment

These output arrays do not have unit norm. Subtracting the mean and giving the samples unit variance does not produce unit vectors.
6

For a 2D array, you can use the following one-liner to normalize across rows. To normalize across columns, simply set axis=0.

a / np.linalg.norm(a, axis=1, keepdims=True)

1 Comment

Thanks for mentioning keepdims=True, that's truly useful for shape-invariant case
1

If you want all values in [0; 1] for 1d-array then just use

(a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))

Where a is your 1d-array.

An example:

>>> a = np.array([0, 1, 2, 4, 5, 2])
>>> (a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
array([0. , 0.2, 0.4, 0.8, 1. , 0.4])

Note for the method. For saving proportions between values there is a restriction: 1d-array must have at least one 0 and consists of 0 and positive numbers.

Comments

1

A simple dot product would do the job. No need for any extra package.

x = x/np.sqrt(x.dot(x))

By the way, if the norm of x is zero, it is inherently a zero vector, and cannot be converted to a unit vector (which has norm 1). If you want to catch the case of np.array([0,0,...0]), then use

norm = np.sqrt(x.dot(x))
x = x/norm if norm != 0 else x

1 Comment

I often use this trick: x_normalised = x / (norm+(norm==0)) so in all cases where the norm is zero, you just divide by one.
0

Unfortunately the simple solution x/numpy.linalg.norm(x) doesn't work if x is an array of vectors. But with a simple reshape() you can force it into a flat list, use a list comprehension, and use reshape() again to get back the original shape.

s=x.shape
np.array([ v/np.linalg.norm(v)  for v in x.reshape(-1, s[-1])]).reshape(s)

First we store the shape of the array

s=x.shape

Then we reshape it into a simple (one-dimensional) array of vectors

x.reshape(-1, s[-1])

by making use of the '-1' argument of reshape() which essentially means "take as many as it needs", e.g . if x was a (4,5,3) array, x.reshape(-1,3) would be of shape (20,3). The use of s[-1] allows for an arbitrary dimension of the vectors.

Then we use a list comprehension to step through the array and calculate the unit vector one vector at a time

[ v/np.linalg.norm(v)  for v in x.reshape(-1, s[-1])]

and finally we turn it back into an numpy array and give it back its original shape

np.array([ v/np.linalg.norm(v)  for v in x.reshape(-1, s[-1])]).reshape(s)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.