sdv.tabular.ctgan.CTGAN.sample¶

CTGAN.
sample
(num_rows=None, max_retries=100, max_rows_multiplier=10, conditions=None, float_rtol=0.01, graceful_reject_sampling=False)¶ Sample rows from this table.
 Parameters
num_rows (int) – Number of rows to sample. If not given the model will generate as many rows as there were in the data passed to the
fit
method.max_retries (int) – Number of times to retry sampling discarded rows. Defaults to 100.
max_rows_multiplier (int) – Multiplier to use when computing the maximum number of rows that can be sampled during the rejectsampling loop. The maximum number of rows that are sampled at each iteration will be equal to this number multiplied by the requested num_rows. Defaults to 10.
conditions (pd.DataFrame, dict or pd.Series) – If this is a dictionary/Series which maps column names to the column value, then this method generates num_rows samples, all of which are conditioned on the given variables. If this is a DataFrame, then it generates an output DataFrame such that each row in the output is sampled conditional on the corresponding row in the input.
float_rtol (float) – Maximum tolerance when considering a float match. This is the maximum relative distance at which a float value will be considered a match when performing rejectsampling based conditioning. Defaults to 0.01.
graceful_reject_sampling (bool) – If False raises a ValueError if not enough valid rows could be sampled within max_retries trials. If True prints a warning and returns as many rows as it was able to sample within max_retries. Defaults to False.
 Returns
Sampled data.
 Return type
pandas.DataFrame
 Raises
ConstraintsNotMetError – If the conditions are not valid for the given constraints.
ValueError – If any of the following happens: * any of the conditions’ columns are not valid. * graceful_reject_sampling is False and not enough valid rows could be sampled within max_retries trials. * no rows could be generated.