copulas.multivariate package¶
Submodules¶
Module contents¶
Multivariate copulas module.
-
class
copulas.multivariate.
Multivariate
(random_state=None)[source]¶ Bases:
object
Abstract class for a multi-variate copula object.
-
cdf
(X)[source]¶ Compute the cumulative distribution value for each point in X.
- Parameters
X (pandas.DataFrame) – Values for which the cumulative distribution will be computed.
- Returns
Cumulative distribution values for points in X.
- Return type
numpy.ndarray
- Raises
NotFittedError – if the model is not fitted.
-
check_fit
()[source]¶ Check whether this model has already been fit to a random variable.
Raise a
NotFittedError
if it has not.- Raises
NotFittedError – if the model is not fitted.
-
cumulative_distribution
(X)[source]¶ Compute the cumulative distribution value for each point in X.
- Parameters
X (pandas.DataFrame) – Values for which the cumulative distribution will be computed.
- Returns
Cumulative distribution values for points in X.
- Return type
numpy.ndarray
- Raises
NotFittedError – if the model is not fitted.
-
fit
(X)[source]¶ Fit the model to table with values from multiple random variables.
- Parameters
X (pandas.DataFrame) – Values of the random variables.
-
fitted
= False¶
-
classmethod
from_dict
(params)[source]¶ Create a new instance from a parameters dictionary.
- Parameters
params (dict) – Parameters of the distribution, in the same format as the one returned by the
to_dict
method.- Returns
Instance of the distribution defined on the parameters.
- Return type
-
classmethod
load
(path)[source]¶ Load a Multivariate instance from a pickle file.
- Parameters
path (str) – Path to the pickle file where the distribution has been serialized.
- Returns
Loaded instance.
- Return type
-
log_probability_density
(X)[source]¶ Compute the log of the probability density for each point in X.
- Parameters
X (pandas.DataFrame) – Values for which the log probability density will be computed.
- Returns
Log probability density values for points in X.
- Return type
numpy.ndarray
- Raises
NotFittedError – if the model is not fitted.
-
pdf
(X)[source]¶ Compute the probability density for each point in X.
- Parameters
X (pandas.DataFrame) – Values for which the probability density will be computed.
- Returns
Probability density values for points in X.
- Return type
numpy.ndarray
- Raises
NotFittedError – if the model is not fitted.
-
probability_density
(X)[source]¶ Compute the probability density for each point in X.
- Parameters
X (pandas.DataFrame) – Values for which the probability density will be computed.
- Returns
Probability density values for points in X.
- Return type
numpy.ndarray
- Raises
NotFittedError – if the model is not fitted.
-
sample
(num_rows=1)[source]¶ Sample values from this model.
- Argument:
- num_rows (int):
Number of rows to sample.
- Returns
Array of shape (n_samples, *) with values randomly sampled from this model distribution.
- Return type
numpy.ndarray
- Raises
NotFittedError – if the model is not fitted.
-
save
(path)[source]¶ Serialize this multivariate instance using pickle.
- Parameters
path (str) – Path to where this distribution will be serialized.
-
-
class
copulas.multivariate.
GaussianMultivariate
(*args, **kwargs)[source]¶ Bases:
copulas.multivariate.base.Multivariate
Class for a multivariate distribution that uses the Gaussian copula.
- Parameters
distribution (str or dict) – Fully qualified name of the class to be used for modeling the marginal distributions or a dictionary mapping column names to the fully qualified distribution names.
-
columns
= None¶
-
correlation
= None¶
-
cumulative_distribution
(X)[source]¶ Compute the cumulative distribution value for each point in X.
- Parameters
X (pandas.DataFrame) – Values for which the cumulative distribution will be computed.
- Returns
Cumulative distribution values for points in X.
- Return type
numpy.ndarray
- Raises
NotFittedError – if the model is not fitted.
-
fit
(X, *args, **kwargs)¶
-
classmethod
from_dict
(copula_dict)[source]¶ Create a new instance from a parameters dictionary.
- Parameters
params (dict) – Parameters of the distribution, in the same format as the one returned by the
to_dict
method.- Returns
Instance of the distribution defined on the parameters.
- Return type
-
probability_density
(X)[source]¶ Compute the probability density for each point in X.
- Parameters
X (pandas.DataFrame) – Values for which the probability density will be computed.
- Returns
Probability density values for points in X.
- Return type
numpy.ndarray
- Raises
NotFittedError – if the model is not fitted.
-
sample
(*args, **kwargs)¶
-
to_dict
()[source]¶ Return a dict with the parameters to replicate this object.
- Returns
Parameters of this distribution.
- Return type
dict
-
univariates
= None¶
-
class
copulas.multivariate.
VineCopula
(*args, **kwargs)[source]¶ Bases:
copulas.multivariate.base.Multivariate
Vine copula model.
A \(vine\) is a graphical representation of one factorization of the n-variate probability distribution in terms of \(n(n − 1)/2\) bivariate copulas by means of the chain rule.
It consists of a sequence of levels and as many levels as variables. Each level consists of a tree (no isolated nodes and no loops) satisfying that if it has \(n\) nodes there must be \(n − 1\) edges.
Each node in tree \(T_1\) is a variable and edges are couplings of variables constructed with bivariate copulas.
Each node in tree \(T_{k+1}\) is a coupling in \(T_{k}\), expressed by the copula of the variables; while edges are couplings between two vertices that must have one variable in common, becoming a conditioning variable in the bivariate copula. Thus, every level has one node less than the former. Once all the trees are drawn, the factorization is the product of all the nodes.
- Parameters
vine_type (str) – type of the vine copula, could be ‘center’,’direct’,’regular’
random_state (int or np.random.RandomState) – Random seed or RandomState to use.
-
model
¶ Distribution to compute univariates.
-
u_matrix
¶ Univariates.
- Type
numpy.array
-
n_sample
¶ Number of samples.
- Type
int
-
n_var
¶ Number of variables.
- Type
int
-
columns
¶ Names of the variables.
- Type
pandas.Series
-
tau_mat
¶ Kendall correlation parameters for data.
- Type
numpy.array
-
truncated
¶ Max level used to build the vine.
- Type
int
-
depth
¶ Vine depth.
- Type
int
-
ppfs
¶ percent point functions from the univariates used by this vine.
- Type
list[callable]
-
fit
(X, *args, **kwargs)¶
-
classmethod
from_dict
(vine_dict)[source]¶ Create a new instance from a parameters dictionary.
- Parameters
params (dict) – Parameters of the Vine, in the same format as the one returned by the
to_dict
method.- Returns
Instance of the Vine defined on the parameters.
- Return type
Vine
-
sample
(*args, **kwargs)¶
-
to_dict
()[source]¶ Return a dict with the parameters to replicate this Vine.
- Returns
Parameters of this Vine.
- Return type
dict
-
train_vine
(tree_type)[source]¶ Build the vine.
For the construction of the first tree \(T_1\), assign one node to each variable and then couple them by maximizing the measure of association considered. Different vines impose different constraints on this construction. When those are applied different trees are achieved at this level.
Select the copula that best fits to the pair of variables coupled by each edge in \(T_1\).
Let \(C_{ij}(u_i , u_j )\) be the copula for a given edge \((u_i, u_j)\) in \(T_1\). Then for every edge in \(T_1\), compute either
\[\begin{split}{v^1}_{j|i} = \\frac{\\partial C_{ij}(u_i, u_j)}{\\partial u_j}\end{split}\]or similarly \({v^1}_{i|j}\), which are conditional cdfs. When finished with all the edges, construct the new matrix with \(v^1\) that has one less column u.
Set k = 2.
Assign one node of \(T_k\) to each edge of \(T_ {k−1}\). The structure of \(T_{k−1}\) imposes a set of constraints on which edges of \(T_k\) are realizable. Hence the next step is to get a linked list of the accesible nodes for every node in \(T_k\).
As in step 1, nodes of \(T_k\) are coupled maximizing the measure of association considered and satisfying the constraints impose by the kind of vine employed plus the set of constraints imposed by tree \(T_{k−1}\).
Select the copula that best fit to each edge created in \(T_k\).
Recompute matrix \(v_k\) as in step 4, but taking \(T_k\) and \(vk−1\) instead of \(T_1\) and u.
Set \(k = k + 1\) and repeat from (5) until all the trees are constructed.
- Parameters
tree_type (str or TreeTypes) – Type of trees to use.
-
class
copulas.multivariate.
Tree
(random_state=None)[source]¶ Bases:
copulas.multivariate.base.Multivariate
Helper class to instantiate a single tree in the vine model.
-
fit
(index, n_nodes, tau_matrix, previous_tree, edges=None)[source]¶ Fit this tree object.
- Parameters
index (int) – index of the tree.
n_nodes (int) – number of nodes in the tree.
tau_matrix (numpy.array) – kendall’s tau matrix of the data, shape (n_nodes, n_nodes).
previous_tree (Tree) – tree object of previous level.
-
fitted
= False¶
-
classmethod
from_dict
(tree_dict, previous=None)[source]¶ Create a new instance from a parameters dictionary.
- Parameters
params (dict) – Parameters of the Tree, in the same format as the one returned by the
to_dict
method.- Returns
Instance of the tree defined on the parameters.
- Return type
-
get_adjacent_matrix
()[source]¶ Get adjacency matrix.
- Returns
adjacency matrix
- Return type
numpy.ndarray
-
get_likelihood
(uni_matrix)[source]¶ Compute likelihood of the tree given an U matrix.
- Parameters
uni_matrix (numpy.array) – univariate matrix to evaluate likelihood on.
- Returns
likelihood of the current tree, next level conditional univariate matrix
- Return type
tuple[float, numpy.array]
-
get_tau_matrix
()[source]¶ Get tau matrix for adjacent pairs.
- Returns
tau matrix for the current tree
- Return type
tau (numpy.ndarray)
-
to_dict
()[source]¶ Return a dict with the parameters to replicate this Tree.
- Returns
Parameters of this Tree.
- Return type
dict
-
tree_type
= None¶
-