Navigator
Univariate Distributions
The Copulas
library supports several of them through the Univariate
subclasses defined with copulas.univariate
package:
copulas.univariate.BetaUnivariate
: Beta distributioncopulas.univariate.GammaUnivariate
: Gamma distributioncopulas.univariate.GaussianKDE
:Kernel-Density Estimate using Gaussian kernelcopulas.univariate.GaussianUnivariate
: Gaussian distribution
Demo: Beta Univariate
Generate samples from Beta
distribution
from copulas.datasets import sample_univariate_beta as SUB
from copulas.visualization import hist_1d
import matplotlib.pyplot as plt
def func_betaUnivariate():
data = SUB()
plt.hist(data, bins=20)
plt.show()
Generate samples from the fitted model
beta = BU()
beta.fit(data)
# print(beta._params)
# Sampling data
nSample = 1000
samples = beta.sample(nSample)
compare_1d(data, samples)
plt.show()
Probability density
The probability density of a Beta distribution is defined by
x α − 1 ( 1 − x ) β − 1 B ( α , β ) \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)} B(α,β)xα−1(1−x)β−1
where
B ( α , β ) = Γ ( α ) Γ ( β ) Γ ( α + β ) B(\alpha, \beta)=\frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)} B(α,β)=Γ(α+β)Γ(α)Γ(β)
The probability density can be computed for an array of data points using the probability_density
method.
probability_density = beta.pdf(samples)
# plot the probability densities sorted by the samples
DataFrame({
'data': samples,
'probability_density': probability_density
}).sort_values('data').set_index('data').plot()
plt.show()
def func_Univariate():
# data = SUB()
# univariate = Univariate()
# univariate.fit(data)
data = SU()
# print(data.head()) # 产生多种服从不同分布的数据
syn_data = DataFrame()
distributions=[]
for col in data.columns:
real_data = data[col]
univariate = Univariate()
univariate.fit(real_data)
syn_data[col]=univariate.sample(len(real_data))
distributions.append(univariate.to_dict()['type'])
print(syn_data.head())
print(distributions)
# compare figures
compare_1d(data, syn_data)
plt.show()
Selecting the best univariate
In some cases, some constraints are required for restricting the search to only some Univariate
distributions. For example, restricting the search over the bimodal
column of the dataset within BetaUnivariate
, GaussianUnivariate
, and GammaUnivariate
.
candidates = [BetaUnivariate, GaussianUnivariate, GammaUnivariate]
uni = Univariate(candidates=candidates) # adding candidates
uni.fit(data['bimodal'])
print(uni.to_dict())
The Univariate
families are organized as follows:
Parametric
: Distributions can be either non-parametric or parametricBounded
: Distributions can be either unbounded, semi-bounded or bounded
When searching for the best distributions, instead of building and passing the list of candidates by hand, we can simply pass the parametric
or bounded
value that we want use.
univ = Univariate(
parametric= ParametricType.PARAMETRIC,
bounded = BoundedType.BOUNDED
)
univ.fit(data['bimodal'])
print(univ.to_dict())