copulas.datasets module

Sample datasets for the Copulas library.

copulas.datasets.sample_bivariate_age_income(size=1000, seed=42)[source]

Sample from a bivariate toy dataset.

This dataset contains two columns which correspond to the simulated age and income which are positively correlated with outliers.

Parameters
  • size (int) – Amount of samples to generate. Defaults to 1000.

  • seed (int) – Random seed to use. Defaults to 42.

Returns

DataFrame with two columns, age and income.

Return type

pandas.DataFrame

copulas.datasets.sample_trivariate_xyz(size=1000, seed=42)[source]

Sample from three dimensional toy dataset.

The output is a DataFrame containing three columns:

  • x: Beta distribution with a=0.1 and b=0.1

  • y: Beta distribution with a=0.1 and b=0.5

  • z: Normal distribution + 10 times y

Parameters
  • size (int) – Amount of samples to generate. Defaults to 1000.

  • seed (int) – Random seed to use. Defaults to 42.

Returns

DataFrame with three columns, x, y and z.

Return type

pandas.DataFrame

copulas.datasets.sample_univariate_bernoulli(size=1000, seed=42)[source]

Sample from a Bernoulli distribution with p=0.3.

The distribution is built by sampling a uniform random and then setting 0 or 1 depending on whether the value is above or below 0.3.

Parameters
  • size (int) – Amount of samples to generate. Defaults to 1000.

  • seed (int) – Random seed to use. Defaults to 42.

Returns

Series with the sampled values.

Return type

pandas.Series

copulas.datasets.sample_univariate_beta(size=1000, seed=42)[source]

Sample from a beta distribution with a=3 and b=1 and loc=4.

Parameters
  • size (int) – Amount of samples to generate. Defaults to 1000.

  • seed (int) – Random seed to use. Defaults to 42.

Returns

Series with the sampled values.

Return type

pandas.Series

copulas.datasets.sample_univariate_bimodal(size=1000, seed=42)[source]

Sample from a bimodal distribution which mixes two Gaussians at 0.0 and 10.0 with stdev=1.

The distribution is built by sampling a standard normal and a normal with mean 10 and then selecting one or the other based on a bernoulli distribution.

Parameters
  • size (int) – Amount of samples to generate. Defaults to 1000.

  • seed (int) – Random seed to use. Defaults to 42.

Returns

Series with the sampled values.

Return type

pandas.Series

copulas.datasets.sample_univariate_degenerate(size=1000, seed=42)[source]

Sample from a degenerate distribution that only takes one random value.

Parameters
  • size (int) – Amount of samples to generate. Defaults to 1000.

  • seed (int) – Random seed to use. Defaults to 42.

Returns

Series with the sampled values.

Return type

pandas.Series

copulas.datasets.sample_univariate_exponential(size=1000, seed=42)[source]

Sample from an exponential distribution at 3.0 with rate 1.0.

Parameters
  • size (int) – Amount of samples to generate. Defaults to 1000.

  • seed (int) – Random seed to use. Defaults to 42.

Returns

Series with the sampled values.

Return type

pandas.Series

copulas.datasets.sample_univariate_normal(size=1000, seed=42)[source]

Sample from a normal distribution with mean 1 and stdev 1.

Parameters
  • size (int) – Amount of samples to generate. Defaults to 1000.

  • seed (int) – Random seed to use. Defaults to 42.

Returns

Series with the sampled values.

Return type

pandas.Series

copulas.datasets.sample_univariate_uniform(size=1000, seed=42)[source]

Sample from a uniform distribution in [-1.0, 3.0].

Parameters
  • size (int) – Amount of samples to generate. Defaults to 1000.

  • seed (int) – Random seed to use. Defaults to 42.

Returns

Series with the sampled values.

Return type

pandas.Series

copulas.datasets.sample_univariates(size=1000, seed=42)[source]

Sample from a list of univariate distributions.

Parameters
  • size (int) – Amount of samples to generate. Defaults to 1000.

  • seed (int) – Random seed to use. Defaults to 42.

Returns

DataFrame with the sampled distributions.

Return type

pandas.DataFrame