CTGAN.sample_remaining_columns(known_columns, max_tries=100, batch_size_per_try=None, randomize_samples=True, output_file_path=None)

Sample rows from this table.

  • known_columns (pandas.DataFrame) – A pandas.DataFrame with the columns that are already known. The output is a DataFrame such that each row in the output is sampled conditionally on the corresponding row in the input.

  • max_tries (int) – Number of times to try sampling discarded rows. Defaults to 100.

  • batch_size_per_try (int) – The batch size to use per attempt at sampling. Defaults to 10 times the number of rows.

  • randomize_samples (bool) – Whether or not to use a fixed seed when sampling. Defaults to True.

  • output_file_path (str or None) – The file to periodically write sampled rows to. Defaults to a temporary file, if None.


Sampled data.

Return type


  • ConstraintsNotMetError – If the conditions are not valid for the given constraints.

  • ValueError – If any of the following happens: * any of the conditions’ columns are not valid. * no rows could be generated.