2

I am attempting to write a function which should solve the integral enter image description here

The data points that are used follow a Power-law proces and can become quite large numbers. The data is simulated using the code,

def simulate_data_constant(beta1, beta2, lam1, sets, total_failures, change):
    betha1 = beta1
    betha2 = beta2
    
    lambda1 = lam1
    lambda2 = lam1
    
    n_traj = sets
    n = total_failures
    x = change
    
    t1 = np.zeros((n_traj, n))
    
    for k in range(n_traj):
        t1[k, 0] = (-np.log(np.random.uniform(0,1))/lambda1)**(1/betha1)
        for j in range(1, x):
            t1[k, j] = (-np.log(np.random.uniform(0,1))/lambda1 + (t1[k, j-1])**betha1)**(1/betha1)
        for j in range(x, n):
            t1[k, j] = (-np.log(np.random.uniform(0,1))/lambda2 + (t1[k, j-1])**betha2)**(1/betha2)
    return t1

To compute the integral I first used the function tplquad but Python gave me a message saying that the integral does not converge. I therefore attempted using Monte Carlo Integration with the following code:

def K1_log(tau, beta1, beta2, n, eps, v, T):
    K1_threshold = 700
    val = (n+eps)*np.log(np.power(tau, beta1)+np.power(T, beta2)-np.power(tau, beta2)+v)
    #return np.where(log_K1_values >= K1_threshold, K1_threshold, log_K1_values)
    return val

def q_j_log(j, beta1, beta2, times):
    prod1_log = beta1*np.sum(np.log(times[:j]))
    prod2_log = beta2*np.sum(np.log(times[j:]))
    return prod1_log + prod2_log

def integral1_log(beta1, beta2, tau, n, eps, v, T, times):
    K1_values = K1_log(tau, beta1, beta2, n, eps, v, T)
    log_prod_term = beta2*np.sum(np.log(times))
    print(K1_values-log_prod_term)
    return 2*np.log(2) - 3*(np.log(1+beta1) + np.log(1+beta2)) + n*np.log(beta2) + log_prod_term - K1_values - np.log(T)

def integral2_log(beta1, beta2, tau, j, n, eps, v, T, times):
    K1_values = K1_log(tau, beta1, beta2, n, eps, v, T)
    q_j_values = q_j_log(j, beta1, beta2, times)
    return 2*np.log(2) - 3*(np.log(1+beta1) + np.log(1+beta2)) + j*np.log(beta1) + (n-j)*np.log(beta2) + q_j_values - K1_values - np.log(T)

def integral3_log(beta1, beta2, tau, n, eps, v, T, times):
    K1_values = K1_log(tau, beta1, beta2, n, eps, v, T)
    log_prod_term = beta1*np.sum(np.log(times))
    return 2*np.log(2) - 3*(np.log(1+beta1) + np.log(1+beta2)) + n*np.log(beta1) + log_prod_term - K1_values - np.log(T)

def Bayes(times_mat, sets, total_failures):
    n_traj = sets
    n = total_failures
    v = 1
    eps = 0.1
    num_samples = 1000
    
    for k in range(n_traj):
        times = times_mat[k]
        T = math.ceil(times[-1])
    
        samples_tau = np.random.uniform(0, times[0], num_samples)
        samples_beta1 = np.random.uniform(0, 3, num_samples)
        samples_beta2 = np.random.uniform(0, 3, num_samples)
        integral_1_values_log = integral1_log(samples_beta1, samples_beta2, samples_tau, n, eps, v, T, times)
        integral_1_values = np.array([np.exp(log_val) for log_val in integral_1_values_log])
        A1 = np.mean(integral_1_values)*times[0]
        
        A2 = 0
        for j in range(1, n-1):
            samples_tau = np.random.uniform(times[j], times[j+1])
            integral_2_values_log = integral2_log(samples_beta1, samples_beta2, samples_tau, j, n, eps, v, T, times)
            integral_2_values = np.array([np.exp(log_val) for log_val in integral_2_values_log])
            A2 += np.mean(integral_2_values) * (times[j+1]-times[j])
        
        samples_tau = np.random.uniform(times[-1], T, num_samples)
        integral_3_values_log = integral3_log(samples_beta1, samples_beta2, samples_tau, n, eps, v, T, times)
        integral_3_values = np.array([np.exp(log_val) for log_val in integral_3_values_log])
        A3 = np.mean(integral_3_values) * (T-times[-1])
        A = A1 + A2 + A3

The implementation above includes a logarithmic transformation of the calculations. The values of x_i were too large resulting in inf values. This works for some cases of the simulated data but for other cases I still get inf as an outcome. Is there a different way I can compute this integral, which would work for all simulated data?

3
  • How large are the "too large" numbers? Maybe this SO question can help you Commented Sep 20, 2024 at 8:30
  • floating point numbers have limited precision and range. Your integrals are too complicated for my brain to process by eyeballing them, but if it were me, I would attempt to drastically simplify them to get crude bounds for the result. A crude lower bound could inform you that the result is too large and you'll need to compute it differently, perhaps using extended precision numeric types or perhaps by computing the log(A) instead, or maybe the log(log(A)), etc. Commented Sep 20, 2024 at 14:56
  • If you could provide me with T, n, epsilon, v, x1 and product(x_i), I could try to compute first integral Commented Sep 21, 2024 at 2:31

1 Answer 1

0

First, you could have problem in your code which is

-np.log(np.random.uniform(0,1))

Why don't you use exponential distribution sampling? Or at least replace np.random.uniform(0,1) with 1-np.random.uniform(0,1). All the sampling in the simulate_data_constants uses exponential, not a power law.

You could try to sample β1 or β2 from their PDFs while doing MC integration. THis is general principle in MC - if you have integral

∫ f(x) g(x) dx,

And g(x) is (could be made into) some PDF(x), you could sample from this PDF and just compute mean of f(x) as a value of integral

This will bring serious improvements to convergence because most samples would concentrate closer to 0. Tau would be still a linear sampling.

So rewrite your integrals

I = ∫∫∫ PDF(β1) PDF(β2) Leftovers12 ...

PDF(β1) = 2/(1 + β1)3

PDF(β2) = 2/(1 + β2)3

Sampling is simple:

β1,2 = √(1/U01) - 1, where U01 is uniform number in (0...1] range

Some code, Python 3.11 Windows x64:

import numpy as np
from matplotlib import pyplot as plt 

def BetaPDF(beta):
    return 2.0/(1.0 + beta)**3

def sampleBetaPDF(rng, N):
    
    u01 = rng.random(size=N)
    u01 = 1.0 - u01 # to make [0...1) -> (0...1] range to avoid divbyzero
    beta = np.sqrt(1.0 / u01) - 1.0
    return beta        
    
rng = np.random.default_rng(122807528840384100672342137672332424406)

N    = 100000
rnge = (0.0, 10.0)

bins = np.linspace(start = rnge[0], stop = rnge[1], num=101)    

data = sampleBetaPDF(rng, N)

hist, bin_edges = np.histogram(data, bins=bins, density=True)

print(hist.sum())
print(np.sum(hist * np.diff(bin_edges)))

fig = plt.figure(figsize =(10, 7))
 
plt.hist(data, bins = bins, density = True)

beta = 0.5 * (bin_edges[:-1] + bin_edges[1:]) 
bpdf = BetaPDF(beta)

plt.scatter(beta, bpdf, c='#9467bd')
 
plt.title("Beta Sampling vs PDF") 
 
plt.show()

And it will produce graph below.

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

np.random.uniform(0,1) and 1-np.random.uniform(0,1) have the same distribution unless I'm missing something.
@MadPhysicist You are correct. It makes no distributional difference which you use. However, using 1-U produces a true inversion, where low or high U values map respectively to low and high exponential outcomes. Using U flip-flops this. This can be leveraged by using the same U both ways, producing negatively correlated outcomes, and averaging the results to yield an unbiased estimate of the mean which has lower variance than independent sampling. This technique is known as antithetic sampling.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.