3

There are N distributions which take on integer values 0,... with associated probabilities. Further, I assume 3 variables [value, prob]:

import numpy as np
x = np.array([ [0,0.3],[1,0.2],[3,0.5] ])
y = np.array([ [10,0.2],[11,0.4],[13,0.1],[14,0.3] ])
z = np.array([ [21,0.3],[23,0.7] ])

As there are N variables I convolve first x+y, then I add z, and so on. Unfortunately numpy.convole() takes 1-d arrays as input variables, so it does not suit in this case directly. I play with variables to take them all values 0,1,2,...,23 (if value is not know then Pr=0)... I feel like there is another much better solution.

Does anyone have a suggestion for making it more efficient? Thanks in advance.

1 Answer 1

4

I don't see a built-in method for this in Scipy; there's a way to define a custom discrete random variables, but those don't support addition. Here is an approach using pandas, assuming import pandas as pd and x,y,z as in your example:

values = np.add.outer(x[:,0], y[:,0]).flatten()
probs = np.multiply.outer(x[:,1], y[:,1]).flatten()
df = pd.DataFrame({'values': values, 'probs': probs})
conv = df.groupby('values').sum()
result = conv.reset_index().values

The output is

array([[ 10.  ,   0.06],
       [ 11.  ,   0.16],
       [ 12.  ,   0.08],
       [ 13.  ,   0.13],
       [ 14.  ,   0.31],
       [ 15.  ,   0.06],
       [ 16.  ,   0.05],
       [ 17.  ,   0.15]])

With more than two variables, you don't have to go back and forth between numpy and pandas: the additional variables can be included at the beginning.

values = np.add.outer(np.add.outer(x[:,0], y[:,0]), z[:,0]).flatten()
probs = np.multiply.outer(np.multiply.outer(x[:,1], y[:,1]), z[:,1]).flatten()

Aside: it would be better to keep values and probabilities in separate numpy arrays, if they have different intrinsic data types (integers vs reals).

Sign up to request clarification or add additional context in comments.

1 Comment

Very nice! I have been searching for some time to find a solution how to generate the probability mass function (PMF) of the sum of two independent discrete random variates, each having arbitrary support, not necessarily the non-negative integers. Every page I have seen so far deals only with the non-negative integers (e.g., Binomial or Poisson distribution) where the sum distribution is naturally obtained by np.convolve(pmf1, pmf2) of the individual PMF arrays and the support is again the non-negative integers. Using np.outer and DataFrame.groupby for the generic case is just genius!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.