【Copulas】Copula python(2)

Univariate Distributions

The Copulas library supports several of them through the Univariate subclasses defined with copulas.univariate package:

  • copulas.univariate.BetaUnivariate: Beta distribution
  • copulas.univariate.GammaUnivariate: Gamma distribution
  • copulas.univariate.GaussianKDE:Kernel-Density Estimate using Gaussian kernel
  • copulas.univariate.GaussianUnivariate: Gaussian distribution

Demo: Beta Univariate

Generate samples from Beta distribution

from copulas.datasets import sample_univariate_beta as SUB
from copulas.visualization import hist_1d
import matplotlib.pyplot as plt

def func_betaUnivariate():
    data = SUB()
    plt.hist(data, bins=20)
    plt.show()

beta-dist
Generate samples from the fitted model

beta = BU()
beta.fit(data)
# print(beta._params)

# Sampling data
nSample = 1000
samples = beta.sample(nSample)
compare_1d(data, samples)
plt.show()

fi

Probability density

The probability density of a Beta distribution is defined by
x α − 1 ( 1 − x ) β − 1 B ( α , β ) \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)} B(α,β)xα1(1x)β1
where
B ( α , β ) = Γ ( α ) Γ ( β ) Γ ( α + β ) B(\alpha, \beta)=\frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)} B(α,β)=Γ(α+β)Γ(α)Γ(β)
The probability density can be computed for an array of data points using the probability_density method.

probability_density = beta.pdf(samples)
# plot the probability densities sorted by the samples

DataFrame({
    
    
    'data': samples,
    'probability_density': probability_density
}).sort_values('data').set_index('data').plot()
plt.show()

fig

def func_Univariate():
    # data = SUB()
    # univariate = Univariate()
    # univariate.fit(data)

    data = SU()
    # print(data.head()) # 产生多种服从不同分布的数据
    syn_data = DataFrame()
    distributions=[]
    for col in data.columns:
        real_data = data[col]
        univariate = Univariate()
        univariate.fit(real_data)
        syn_data[col]=univariate.sample(len(real_data))
        distributions.append(univariate.to_dict()['type'])

    print(syn_data.head())
    print(distributions)
    # compare figures
    compare_1d(data, syn_data)
    plt.show()

fig

Selecting the best univariate

In some cases, some constraints are required for restricting the search to only some Univariate distributions. For example, restricting the search over the bimodal column of the dataset within BetaUnivariate, GaussianUnivariate, and GammaUnivariate.

candidates = [BetaUnivariate, GaussianUnivariate, GammaUnivariate]
uni = Univariate(candidates=candidates) # adding candidates
uni.fit(data['bimodal'])
print(uni.to_dict())

The Univariate families are organized as follows:

  • Parametric: Distributions can be either non-parametric or parametric
  • Bounded: Distributions can be either unbounded, semi-bounded or bounded

When searching for the best distributions, instead of building and passing the list of candidates by hand, we can simply pass the parametric or bounded value that we want use.

univ = Univariate(
        parametric= ParametricType.PARAMETRIC,
        bounded = BoundedType.BOUNDED
    )

univ.fit(data['bimodal'])
print(univ.to_dict())

猜你喜欢

转载自blog.csdn.net/qq_18822147/article/details/118495517
今日推荐