DAI-Lab An open source project from Data to AI Lab at MIT.

Development Status PyPi Shield Run Tests Coverage Status Downloads Binder Slack

SDV - The Synthetic Data Vault

Date: Dec 22, 2021 Version: 0.13.1


The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset.

Synthetic data can then be used to supplement, augment and in some cases replace real data when training Machine Learning models. Additionally, it enables the testing of Machine Learning or other data dependent software systems without the risk of exposure that comes with data disclosure.

Underneath the hood it uses several probabilistic graphical modeling and deep learning based techniques. To enable a variety of data storage structures, we employ unique hierarchical generative modeling and recursive sampling techniques.

Current functionality and features:

Try it out now!

If you want to quickly discover SDV, simply click the button below and follow the tutorials!


Join our Slack Workspace

If you want to be part of the SDV community to receive announcements of the latest releases, ask questions, suggest new features or participate in the development meetings, please join our Slack Workspace!


Explore SDV