Probability functions convolution in python

Question

There are N distributions which take on integer values 0,... with associated probabilities. Further, I assume 3 variables [value, prob]:

import numpy as np
x = np.array([ [0,0.3],[1,0.2],[3,0.5] ])
y = np.array([ [10,0.2],[11,0.4],[13,0.1],[14,0.3] ])
z = np.array([ [21,0.3],[23,0.7] ])

As there are N variables I convolve first x+y, then I add z, and so on. Unfortunately numpy.convole() takes 1-d arrays as input variables, so it does not suit in this case directly. I play with variables to take them all values 0,1,2,...,23 (if value is not know then Pr=0)... I feel like there is another much better solution.

Does anyone have a suggestion for making it more efficient? Thanks in advance.

user6655984 · Accepted Answer · 2016-12-14 16:56:21Z

4

I don't see a built-in method for this in Scipy; there's a way to define a custom discrete random variables, but those don't support addition. Here is an approach using pandas, assuming import pandas as pd and x,y,z as in your example:

values = np.add.outer(x[:,0], y[:,0]).flatten()
probs = np.multiply.outer(x[:,1], y[:,1]).flatten()
df = pd.DataFrame({'values': values, 'probs': probs})
conv = df.groupby('values').sum()
result = conv.reset_index().values

The output is

array([[ 10.  ,   0.06],
       [ 11.  ,   0.16],
       [ 12.  ,   0.08],
       [ 13.  ,   0.13],
       [ 14.  ,   0.31],
       [ 15.  ,   0.06],
       [ 16.  ,   0.05],
       [ 17.  ,   0.15]])

With more than two variables, you don't have to go back and forth between numpy and pandas: the additional variables can be included at the beginning.

values = np.add.outer(np.add.outer(x[:,0], y[:,0]), z[:,0]).flatten()
probs = np.multiply.outer(np.multiply.outer(x[:,1], y[:,1]), z[:,1]).flatten()

Aside: it would be better to keep values and probabilities in separate numpy arrays, if they have different intrinsic data types (integers vs reals).

answered Dec 14, 2016 at 16:56

user6655984

Sign up to request clarification or add additional context in comments.

1 Comment

Andras Vanyolos Apr 22 at 21:56

Very nice! I have been searching for some time to find a solution how to generate the probability mass function (PMF) of the sum of two independent discrete random variates, each having arbitrary support, not necessarily the non-negative integers. Every page I have seen so far deals only with the non-negative integers (e.g., Binomial or Poisson distribution) where the sum distribution is naturally obtained by np.convolve(pmf1, pmf2) of the individual PMF arrays and the support is again the non-negative integers. Using np.outer and DataFrame.groupby for the generic case is just genius!

Collectives™ on Stack Overflow

Probability functions convolution in python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related