how can show the area under pdf = 1 for my case.
for discrete case
import numpy as np
x = np.random.normal(size=1000)
x=x*0.7
hist, bin_edges = np.histogram(x, density=True)
##print(hist.sum())
print(np.sum(hist * np.diff(bin_edges)))
or:
import matplotlib.pyplot as plt
n, bins, patches = plt.hist(x, bins=10, density=True, edgecolor='black', lw=3, fc=(0, 0, 1, 0.5), alpha=0.2) # color='maroon',
plt.hist(x, bins=10, cumulative=True, lw=3, fc=(0, 0, 0.5, 0.3), log=True) # fc= RGBA
##print(n, bins, patches.datavalues)
density = n / (sum(n) * np.diff(bins))
##print(density)
#### the area (or integral) under the histogram will sum to 1 = (np.sum(density * np.diff(bins)) == 1).
print(np.sum(density * np.diff(bins)))
print(np.allclose(np.sum(density * np.diff(bins)) , 1))
for continuous:
# https://stackoverflow.com/a/59096585/15893581
# Calculate a KDE, then use the KDE as if it were a PDF
from scipy.stats import gaussian_kde
kde = gaussian_kde(x)
#get probability
print(kde.integrate_box_1d( -np.inf, np.inf))
or as was suggested
import matplotlib.pyplot as plt
counts_, bins_, patches_ = plt.hist(x, bins=10, density=True)
pdf = np.array(counts_/sum(counts_))
print(np.trapz(pdf, x=None, dx=1.0, axis=-1))
or for normal distr. can also do
from scipy.integrate import quad
fun= lambda x: np.exp(-x**2/2)/(np.sqrt(2*np.pi))
y, err= quad(fun, -1000, 1000)
print(y)
or using rv_histogram:
from scipy.stats import rv_histogram
r = rv_histogram(np.histogram(x, bins=100))
r.pdf(np.linspace(0,1,5))
- for custom distribution see here & do integration as previosly mentioned
- cdf from pdf here
- but I agree with comments here - @My Work: "it will always return something, the question is what"
- HERE for norm distr: "scipy also has a CDF function that returns the integral from -inf to x":
scipy.stats.norm.cdf(np.inf); # 1.0
- from CDF to PDF: use derivative
dx=0.1; np.gradient(pdf, dx) because PDF(x) = d CDF(x)/ dx, meaning that probability density on PDF is a rate of change for CDF
- Performance considerations
normed = Trueis depricated,density = Trueis enough to get sum of probabilities equal to1while binarizing data... can also seenp.discretizeif would like