sdv.metadata.table.
Table
Table Metadata.
The Metadata class provides a unified layer of abstraction over the metadata of a single Table, which includes all the necessary details to handle the table of this data, including the data types, the fields with pii information and the constraints that affect this data.
name (str) – Name of this table. Optional.
field_names (list[str]) – List of names of the fields that need to be modeled and included in the generated output data. Any additional fields found in the data will be ignored and will not be included in the generated output. If None, all the fields found in the data are used.
None
field_types (dict[str, dict]) – Dictinary specifying the data types and subtypes of the fields that will be modeled. Field types and subtypes combinations must be compatible with the SDV Metadata Schema.
field_transformers (dict[str, str]) –
Dictinary specifying which transformers to use for each field. Available transformers are:
integer: Uses a NumericalTransformer of dtype int. float: Uses a NumericalTransformer of dtype float. categorical: Uses a CategoricalTransformer without gaussian noise. categorical_fuzzy: Uses a CategoricalTransformer adding gaussian noise. one_hot_encoding: Uses a OneHotEncodingTransformer. label_encoding: Uses a LabelEncodingTransformer. boolean: Uses a BooleanTransformer. datetime: Uses a DatetimeTransformer.
integer: Uses a NumericalTransformer of dtype int.
integer
NumericalTransformer
int
float: Uses a NumericalTransformer of dtype float.
float
categorical: Uses a CategoricalTransformer without gaussian noise.
categorical
CategoricalTransformer
categorical_fuzzy: Uses a CategoricalTransformer adding gaussian noise.
categorical_fuzzy
one_hot_encoding: Uses a OneHotEncodingTransformer.
one_hot_encoding
OneHotEncodingTransformer
label_encoding: Uses a LabelEncodingTransformer.
label_encoding
LabelEncodingTransformer
boolean: Uses a BooleanTransformer.
boolean
BooleanTransformer
datetime: Uses a DatetimeTransformer.
datetime
DatetimeTransformer
anonymize_fields (dict[str, str]) – Dict specifying which fields to anonymize and what faker category they belong to.
primary_key (str) – Name of the field which is the primary key of the table.
constraints (list[Constraint, dict]) – List of Constraint objects or dicts.
dtype_transformers (dict) – Dictionary of transformer templates to be used for the different data types. The keys must be any of the dtype.kind values, i, f, O, b or M, and the values must be either RDT Transformer classes or RDT Transformer instances.
model_kwargs (dict) – Dictionary specifiying the kwargs that need to be used in each tabular model when working on this table. This dictionary contains as keys the name of the TabularModel class and as values a dictionary containing the keyword arguments to use. This argument exists mostly to ensure that the models are fitted using the same arguments when the same Table is used to fit different model instances on different slices of the same table.
sequence_index (str) – Name of the column that acts as the order index of each sequence. The sequence index column can be of any type that can be sorted, such as integer values or datetimes.
entity_columns (list[str]) – Names of the columns which identify different time series sequences. These will be used to group the data in separated training examples.
context_columns (list[str]) – The columns in the dataframe which are constant within each group/entity. These columns will be provided at sampling time (i.e. the samples will be conditioned on the context variables).
rounding (int, str or None) – Define rounding scheme for NumericalTransformer. If set to an int, values will be rounded to that number of decimal places. If None, values will not be rounded. If set to 'auto', the transformer will round to the maximum number of decimal places detected in the fitted data. Defaults to 'auto'.
'auto'
min_value (int, str or None) – Specify the minimum value the NumericalTransformer should use. If an integer is given, sampled data will be greater than or equal to it. If the string 'auto' is given, the minimum will be the minimum value seen in the fitted data. If None is given, there won’t be a minimum. Defaults to 'auto'.
max_value (int, str or None) – Specify the maximum value the NumericalTransformer should use. If an integer is given, sampled data will be less than or equal to it. If the string 'auto' is given, the maximum will be the maximum value seen in the fitted data. If None is given, there won’t be a maximum. Defaults to 'auto'.
__init__
Initialize self. See help(type(self)) for accurate signature.
Methods
__init__([name, field_names, field_types, …])
Initialize self.
filter_valid(data)
filter_valid
Filter the data using the constraints and return only the valid rows.
fit(data)
fit
Fit this metadata to the given data.
from_dict(metadata_dict[, dtype_transformers])
from_dict
Load a Table from a metadata dict.
from_json(path)
from_json
Load a Table from a JSON.
get_dtypes([ids])
get_dtypes
Get a dict with the dtypes for each field of the table.
dict
dtypes
get_fields()
get_fields
Get fields metadata.
get_model_kwargs(model_name)
get_model_kwargs
Return the required model kwargs for the indicated model.
make_ids_unique(data)
make_ids_unique
Repopulate any id fields in provided data to guarantee uniqueness.
reverse_transform(data)
reverse_transform
Reverse the transformed data to the original format.
set_model_kwargs(model_name, model_kwargs)
set_model_kwargs
Set the model kwargs used for the indicated model.
set_primary_key(primary_key)
set_primary_key
Set the primary key of this table.
to_dict()
to_dict
Get a dict representation of this metadata.
to_json(path)
to_json
Dump this metadata into a JSON file.
transform(data[, on_missing_column])
transform
Transform the given data.
Attributes
fitted