122

Given a 3 times 3 numpy array

a = numpy.arange(0,27,3).reshape(3,3)

# array([[ 0,  3,  6],
#        [ 9, 12, 15],
#        [18, 21, 24]])

To normalize the rows of the 2-dimensional array I thought of

row_sums = a.sum(axis=1) # array([ 9, 36, 63])
new_matrix = numpy.zeros((3,3))
for i, (row, row_sum) in enumerate(zip(a, row_sums)):
    new_matrix[i,:] = row / row_sum

There must be a better way, isn't there?

Perhaps to clearify: By normalizing I mean, the sum of the entrys per row must be one. But I think that will be clear to most people.

2
  • 20
    Careful, "normalize" usually means the square sum of components is one. Your definition will hardly be clear to most people;) Commented Jul 13, 2015 at 18:10
  • 7
    @coldfix speaks about L2 norm and considers it as most common (which may be true) while Aufwind uses L1 norm which is also a norm indeed. Commented Feb 12, 2021 at 9:50

12 Answers 12

181

Broadcasting is really good for this:

row_sums = a.sum(axis=1)
new_matrix = a / row_sums[:, numpy.newaxis]

row_sums[:, numpy.newaxis] reshapes row_sums from being (3,) to being (3, 1). When you do a / b, a and b are broadcast against each other.

You can learn more about broadcasting here or even better here.

Sign up to request clarification or add additional context in comments.

10 Comments

This can be simplified even further using a.sum(axis=1, keepdims=True) to keep the singleton column dimension, which you can then broadcast along without having to use np.newaxis.
what if any of the row_sums is zero?
This is the correct answer for the question as stated above - but if a normalization in the usual sense is desired, use np.linalg.norm instead of a.sum!
is this preferred to row_sums.reshape(3,1) ?
It's not as robust since the row sum may be 0.
|
137

Scikit-learn offers a function normalize() that lets you apply various normalizations. The "make it sum to 1" is called L1-norm. Therefore:

from sklearn.preprocessing import normalize

matrix = numpy.arange(0,27,3).reshape(3,3).astype(numpy.float64)
# array([[  0.,   3.,   6.],
#        [  9.,  12.,  15.],
#        [ 18.,  21.,  24.]])

normed_matrix = normalize(matrix, axis=1, norm='l1')
# [[ 0.          0.33333333  0.66666667]
#  [ 0.25        0.33333333  0.41666667]
#  [ 0.28571429  0.33333333  0.38095238]]

Now your rows will sum to 1.

1 Comment

This also has the advantage that it works on sparse arrays that would not fit into memory as dense arrays.
11

I think this should work,

a = numpy.arange(0,27.,3).reshape(3,3)

a /=  a.sum(axis=1)[:,numpy.newaxis]

1 Comment

good. note the change of dtype to arange, by appending decimal point to 27.
6

In case you are trying to normalize each row such that its magnitude is one (i.e. a row's unit length is one or the sum of the square of each element in a row is one):

import numpy as np

a = np.arange(0,27,3).reshape(3,3)

result = a / np.linalg.norm(a, axis=-1)[:, np.newaxis]
# array([[ 0.        ,  0.4472136 ,  0.89442719],
#        [ 0.42426407,  0.56568542,  0.70710678],
#        [ 0.49153915,  0.57346234,  0.65538554]])

Verifying:

np.sum( result**2, axis=-1 )
# array([ 1.,  1.,  1.]) 

2 Comments

Axis doesn't seem to be a parameter to np.linalg.norm (anymore?).
notably this corresponds to the l2 norm (where as rows summing to 1 corresponds to the l1 norm)
5

I think you can normalize the row elements sum to 1 by this: new_matrix = a / a.sum(axis=1, keepdims=1). And the column normalization can be done with new_matrix = a / a.sum(axis=0, keepdims=1). Hope this can hep.

Comments

2

You could use built-in numpy function: np.linalg.norm(a, axis = 1, keepdims = True)

1 Comment

This computes the norm and does not normalize the matrix
1

it appears that this also works

def normalizeRows(M):
    row_sums = M.sum(axis=1)
    return M / row_sums

Comments

0

You could also use matrix transposition:

(a.T / row_sums).T

2 Comments

this answer is incomplete without how you computed row_sums
It is in the original question: row_sums = a.sum(axis=1)
0

Here is one more possible way using reshape:

a_norm = (a/a.sum(axis=1).reshape(-1,1)).round(3)
print(a_norm)

Or using None works too:

a_norm = (a/a.sum(axis=1)[:,None]).round(3)
print(a_norm)

Output:

array([[0.   , 0.333, 0.667],
       [0.25 , 0.333, 0.417],
       [0.286, 0.333, 0.381]])

Comments

0

Use

a = a / np.linalg.norm(a, ord = 2, axis = 0, keepdims = True)

Due to the broadcasting, it will work as intended.

Comments

-1

Or using lambda function, like

>>> vec = np.arange(0,27,3).reshape(3,3)
>>> import numpy as np
>>> norm_vec = map(lambda row: row/np.linalg.norm(row), vec)

each vector of vec will have a unit norm.

1 Comment

is this using python's map? won't builtin numpy functions be much faster?
-1

We can achieve the same effect by premultiplying with the diagonal matrix whose main diagonal is the reciprocal of the row sums.

A = np.diag(A.sum(1)**-1) @ A

2 Comments

too inefficient. you turned a simple sum over all elements into a big (sparse) matrix multiplication
@qwr The original poster did not ask for a more efficient version, only a less "verbose" one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.