ConstraintsΒΆ

SDV supports adding constraints within a single table. See Constraints for more information about the available single table constraints.

In order to use single-table constraints within a relational model, you can pass in a list of applicable constraints when adding a table to your relational Metadata. (See Relational Metadata for more information on constructing a Metadata object.)

In this example, we wish to add a FixedCombinations constraint to our sessions table, which is a child table of users. First, we will create a Metadata object and add the users table.

In [1]: from sdv import load_demo, Metadata

In [2]: tables = load_demo()

In [3]: metadata = Metadata()

In [4]: metadata.add_table(
   ...:     name='users',
   ...:     data=tables['users'],
   ...:     primary_key='user_id'
   ...: )
   ...: 

The metadata now contains the users table.

In [5]: metadata
Out[5]: 
Metadata
  root_path: .
  tables: ['users']
  relationships:

Now, we want to add a child table sessions which contains a single table constraint. In the sessions table, we wish to only have combinations of (device, os) that appear in the original data.

In [6]: from sdv.constraints import FixedCombinations

In [7]: constraint = FixedCombinations(column_names=['device', 'os'])

In [8]: metadata.add_table(
   ...:     name='sessions',
   ...:     data=tables['sessions'],
   ...:     primary_key='session_id',
   ...:     parent='users',
   ...:     foreign_key='user_id',
   ...:     constraints=[constraint],
   ...: )
   ...: 

If we get the table metadata for sessions, we can see that the constraint has been added.

In [9]: metadata.get_table_meta('sessions')
Out[9]: 
{'fields': {'session_id': {'type': 'id', 'subtype': 'integer'},
  'user_id': {'type': 'id',
   'subtype': 'integer',
   'ref': {'table': 'users', 'field': 'user_id'}},
  'device': {'type': 'categorical'},
  'os': {'type': 'categorical'},
  'minutes': {'type': 'numerical', 'subtype': 'integer'}},
 'constraints': [{'constraint': 'sdv.constraints.tabular.FixedCombinations',
   'column_names': ['device', 'os']}],
 'primary_key': 'session_id'}

We can now use this metadata to fit a relational model and synthesize data.

In [10]: from sdv.relational import HMA1

In [11]: model = HMA1(metadata)

In [12]: model.fit(tables)

In [13]: new_data = model.sample()

In the sampled data, we should see that our constraint is being satisfied.

In [14]: new_data
Out[14]: 
{'users':    user_id country gender  age
 0        0      ES      F   25
 1        1      US    NaN   32
 2        2      ES      F   38
 3        3      BG      F   44
 4        4      UK      M   30
 5        5      DE      F   38
 6        6      US    NaN   45
 7        7      BG      M   42
 8        8      ES      M   27
 9        9      DE      M   35,
 'sessions':    session_id  user_id  device       os  minutes
 0           0        0  mobile  android       29
 1           1        1  mobile  android       21
 2           2        2  tablet  android       20
 3           3        3  mobile  android       29
 4           4        4  mobile  android       34
 5           5        5  mobile  android       34
 6           6        6  mobile  android       21
 7           7        7  tablet  android       17
 8           8        8  mobile  android       29
 9           9        9  tablet  android       23}