Danger

You are looking at the documentation for an older version of the SDV! We are no longer supporting or maintaining this version of the software

Click here to go to the new docs pages.

sdv.tabular.copulas.GaussianCopula

class sdv.tabular.copulas.GaussianCopula(field_names=None, field_types=None, field_transformers=None, anonymize_fields=None, primary_key=None, constraints=None, table_metadata=None, field_distributions=None, default_distribution=None, categorical_transformer=None, learn_rounding_scheme=True, enforce_min_max_values=True)[source]

Model wrapping copulas.multivariate.GaussianMultivariate copula.

Parameters
  • field_names (list[str]) – List of names of the fields that need to be modeled and included in the generated output data. Any additional fields found in the data will be ignored and will not be included in the generated output. If None, all the fields found in the data are used.

  • field_types (dict[str, dict]) – Dictinary specifying the data types and subtypes of the fields that will be modeled. Field types and subtypes combinations must be compatible with the SDV Metadata Schema.

  • field_transformers (dict[str, str]) –

    Dictinary specifying which transformers to use for each field. Available transformers are:

    • FloatFormatter: Uses a FloatFormatter for numerical data.

    • FrequencyEncoder: Uses a FrequencyEncoder without gaussian noise.

    • FrequencyEncoder_noised: Uses a FrequencyEncoder adding gaussian noise.

    • OneHotEncoder: Uses a OneHotEncoder.

    • LabelEncoder: Uses a LabelEncoder without gaussian nose.

    • LabelEncoder_noised: Uses a LabelEncoder adding gaussian noise.

    • BinaryEncoder: Uses a BinaryEncoder.

    • UnixTimestampEncoder: Uses a UnixTimestampEncoder.

  • anonymize_fields (dict[str, str]) – Dict specifying which fields to anonymize and what faker category they belong to.

  • primary_key (str) – Name of the field which is the primary key of the table.

  • constraints (list[Constraint, dict]) – List of Constraint objects or dicts.

  • table_metadata (dict or metadata.Table) – Table metadata instance or dict representation. If given alongside any other metadata-related arguments, an exception will be raised. If not given at all, it will be built using the other arguments or learned from the data.

  • field_distributions (dict) –

    Dictionary that maps field names from the table that is being modeled with the distribution that needs to be used. The distributions can be passed as either a copulas.univariate instance or as one of the following values:

    • gaussian: Use a Gaussian distribution.

    • gamma: Use a Gamma distribution.

    • beta: Use a Beta distribution.

    • student_t: Use a Student T distribution.

    • gaussian_kde: Use a GaussianKDE distribution. This model is non-parametric, so using this will make get_parameters unusable.

    • truncated_gaussian: Use a Truncated Gaussian distribution.

  • default_distribution (copulas.univariate.Univariate or str) – Copulas univariate distribution to use by default. To choose from the list of possible field_distribution values. Defaults to truncated_gaussian.

  • categorical_transformer (str) –

    Type of transformer to use for the categorical variables, which must be one of the following values:

    • OneHotEncoder: Apply a OneHotEncoder to the categorical column, which replaces the column with one boolean column for each possible category, indicating whether each row had that value or not.

    • LabelEncoder: Apply a LabelEncoder, which replaces the value of each category with an integer value that acts as its label.

    • LabelEncoder_noised: Apply a LabelEncoder, which replaces the value of each category with an integer value that acts as its label.

    • FrequencyEncoder: Apply FrequencyEncoder, which replaces each categorical value with a float number in the [0, 1] range which is inversely proportional to the frequency of that category.

    • FrequencyEncoder_noised: Apply a FrequencyEncoder with the add_noise argument set to True, which makes it add gaussian noise around each value.

    Defaults to FrequencyEncoder_noised.

  • learn_rounding_scheme (bool) – Define rounding scheme for FloatFormatter. If True, the data returned by reverse_transform will be rounded to that place. Defaults to True.

  • enforce_min_max_values (bool) – Specify whether or not to clip the data returned by reverse_transform of the numerical transformer, FloatFormatter, to the min and max values seen during fit. Defaults to True.

__init__(field_names=None, field_types=None, field_transformers=None, anonymize_fields=None, primary_key=None, constraints=None, table_metadata=None, field_distributions=None, default_distribution=None, categorical_transformer=None, learn_rounding_scheme=True, enforce_min_max_values=True)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([field_names, field_types, …])

Initialize self.

fit(data)

Fit this model to the data.

get_distributions()

Get the marginal distributions used by this copula.

get_likelihood(table_data)

Get the likelihood of each row belonging to this table.

get_metadata()

Get metadata about the table.

get_parameters()

Get the parameters learned from the data.

load(path)

Load a TabularModel instance from a given path.

sample(num_rows[, randomize_samples, …])

Sample rows from this table.

sample_conditions(conditions[, …])

Sample rows from this table with the given conditions.

sample_remaining_columns(known_columns[, …])

Sample rows from this table.

save(path)

Save this model instance to the given path using cloudpickle.

set_parameters(parameters)

Regenerate a previously learned model from its parameters.