0

Create a matrix like a transtision matrix How i can create random matrix with sum of values by column = 1 in python ?

1

2 Answers 2

3

(EDIT: added output)

I suggest completing this in two steps:

  1. Create a random matrix

  2. Normalize each column

1. Create random matrix

Let's say you want a 3 by 3 random transition matrix:

M = np.random.rand(3, 3)

Each of M's entries will have a random value between 0 and 1.

Normalize M's columns

By dividing each column by the column sum will achieve what you want. This can be done in several ways, but I prefer to create an array r whose elements is the column sum of M:

r = M.sum(axis=0)

Then, divide M by r:

transition_matrix = M / r

Example output

>>> import numpy as np

>>> M = np.random.rand(3,3 )
>>> r = M.sum(axis=0)
>>> transition_matrix = M / r

>>> M
array([[0.74145687, 0.68389986, 0.37008102],
       [0.81869654, 0.0394523 , 0.94880781],
       [0.93057194, 0.48279246, 0.15581823]])
>>> r
array([2.49072535, 1.20614462, 1.47470706])
>>> transition_matrix
array([[0.29768713, 0.56701315, 0.25095223],
       [0.32869804, 0.03270943, 0.64338731],
       [0.37361483, 0.40027743, 0.10566046]])
>>> transition_matrix.sum(axis=0)
array([1., 1., 1.])
Sign up to request clarification or add additional context in comments.

3 Comments

You recognize that distribution of those numbers would be ... what?
As described in the docs, numpy.random.rand uses a uniform distribution over [0, 1).
and what kind of distribution would be for values (in simplest case of 2x2 matrix) like X1/(X1+X2) where X1,X2 are both U(0,1) ? I put update in my answer to discuss the problem
1

You could use KNOWN distribution where each sample would have (by default) summed to one, e.g. Dirichlet distribution.

After that code is basically one liner, Python 3.8, Windows 10 x64

import numpy as np

N = 3

# set alphas array, 1s by default
a = np.empty(N)
a.fill(1.0)

mtx = np.random.dirichlet(a, N).transpose()

print(mtx)

and it will print something like

[[0.56634637 0.04568052 0.79105779]
 [0.42542107 0.81892862 0.02465906]
 [0.00823256 0.13539087 0.18428315]]

UPDATE

For the case of "sample something and normalize", problem is one would get value from unknown distribution. For Dirichlet there are expressions for mean, std.dev, PDF, CDF, you name it.

Even for the case with Xi sampled from U(0,1) what would be distribution of values for Xi/Sum(i, Xi).

Anything to say about mean? std.dev? PDF? Other stat properties?

You could sample from exponential and get sum normalized to 1, but question would be even more acute - if Xi is Exp(1), what is the distribution for Xi/Sum(i, Xi) ? PDF? Mean? Std.dev?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.