Danger You are looking at the documentation for an older version of the SDV! We are no longer supporting or maintaining this version of the software Click here to go to the new docs pages.
Danger
You are looking at the documentation for an older version of the SDV! We are no longer supporting or maintaining this version of the software
Click here to go to the new docs pages.
sdv.timeseries.deepecho.
PAR
DeepEcho model based on the deepecho.models.par.PARModel class.
field_names (list[str]) – List of names of the fields that need to be modeled and included in the generated output data. Any additional fields found in the data will be ignored and will not be included in the generated output. If None, all the fields found in the data are used.
None
field_types (dict[str, dict]) – Dictinary specifying the data types and subtypes of the fields that will be modeled. Field types and subtypes combinations must be compatible with the SDV Metadata Schema.
anonymize_fields (dict[str, str]) – Dict specifying which fields to anonymize and what faker category they belong to.
primary_key (str) – Name of the field which is the primary key of the table.
entity_columns (list[str]) – Names of the columns which identify different time series sequences. These will be used to group the data in separated training examples.
context_columns (list[str]) – The columns in the dataframe which are constant within each group/entity. These columns will be provided at sampling time (i.e. the samples will be conditioned on the context variables).
segment_size (int, pd.Timedelta or str) – If specified, cut each training sequence in several segments of the indicated size. The size can either can passed as an integer value, which will interpreted as the number of data points to put on each segment, or as a pd.Timedelta (or equivalent str representation), which will be interpreted as the segment length in time. Timedelta segment sizes can only be used with sequence indexes of type datetime.
sequence_index (str) – Name of the column that acts as the order index of each sequence. The sequence index column can be of any type that can be sorted, such as integer values or datetimes.
context_model (str or sdv.tabular.BaseTabularModel) –
Model to use to sample the context rows. It can be passed as a a string, which must be one of the following:
gaussian_copula (default): Use a GaussianCopula model.
Alternatively, a preconfigured Tabular model instance can be passed.
table_metadata (dict or metadata.Table) – Table metadata instance or dict representation. If given alongside any other metadata-related arguments, an exception will be raised. If not given at all, it will be built using the other arguments or learned from the data.
epochs (int) – The number of epochs to train for. Defaults to 128.
sample_size (int) – The number of times to sample (before choosing and returning the sample which maximizes the likelihood). Defaults to 1.
cuda (bool) – Whether to attempt to use cuda for GPU computation. If this is False or CUDA is not available, CPU will be used. Defaults to True.
True
verbose (bool) – Whether to print progress to console or not.
__init__
Initialize self. See help(type(self)) for accurate signature.
Methods
__init__([field_names, field_types, …])
Initialize self.
fit(timeseries_data)
fit
Fit this model to the data.
get_metadata()
get_metadata
Get metadata about the table.
load(path)
load
Load a TabularModel instance from a given path.
sample([num_sequences, context, sequence_length])
sample
Sample new sequences.
save(path)
save
Save this model instance to the given path using cloudpickle.