sdv.SDV.sample

SDV.sample(table_name=None, num_rows=None, sample_children=True, reset_primary_keys=False)[source]

Generate synthetic data for one table or the entire dataset.

If a table_name is given and sample_children is False, a pandas.DataFrame with the values from the indicated table is returned. Otherwise, if sample_children is True, a dictionary containing both the table and all its descendant tables is returned.

If no table_name is given, the entire dataset is sampled and returned in a dictionary.

If num_rows is given, the root tables of the dataset will contain the indicated number of rows. Otherwise, the number of rows will be the same as in the original dataset. Number of rows in the child tables cannot be controlled and always will depend on the values from the sampled parent tables.

If reset_primary_keys is True, the primary key generators will be reset.

Parameters
  • table_name (str) – Name of the table to sample from. If not passed, sample the entire dataset.

  • num_rows (int) – Amount of rows to sample. If None, sample the same number of rows as there were in the original table.

  • sample_children (bool) – Whether or not sample child tables. Used only if table_name is given. Defaults to True.

  • reset_primary_keys (bool) – Whether or not reset the primary keys generators. Defaults to False.

Returns

  • Returns a dict when sample_children is True with the sampled table and child tables.

  • Returns a pandas.DataFrame when sample_children is False.

Return type

dict or pandas.DataFrame

Raises

NotFittedError – A NotFittedError is raised when the SDV instance has not been fitted yet.