Danger

You are looking at the documentation for an older version of the SDV! We are no longer supporting or maintaining this version of the software

Click here to go to the new docs pages.

sdv.lite.tabular.TabularPreset.sample_remaining_columns

TabularPreset.sample_remaining_columns(known_columns, max_tries_per_batch=100, batch_size=None, randomize_samples=True, output_file_path=None)[source]

Sample rows from this table.

Parameters
  • known_columns (pandas.DataFrame) – A pandas.DataFrame with the columns that are already known. The output is a DataFrame such that each row in the output is sampled conditionally on the corresponding row in the input.

  • max_tries_per_batch (int) – Number of times to try sampling discarded rows. Defaults to 100.

  • batch_size (int) – The batch size to use per attempt at sampling. Defaults to 10 times the number of rows.

  • randomize_samples (bool) – Whether or not to use a fixed seed when sampling. Defaults to True.

  • output_file_path (str or None) – The file to periodically write sampled rows to. Defaults to a temporary file, if None.

Returns

Sampled data.

Return type

pandas.DataFrame