The Synthetic Data Vault project is a collection of several open source libraries with purposes
that range from simple reversible transformations to the most complex state-of-the-art Generative
Deep Learning models.
Building and maintaining all these libraries is a community driven effort which relies on the work
of developers from all around the world that contribute their knowledge, code and feedback to make
the project grow and improve over time.
Are you ready to join this community? If so, please feel welcome and keep reading!
There are several ways to contribute to a project like SDV, and they do not always involve
If you want to contribute but do not know where to start, consider one of the following options:
If there is something that you would like to see changed in the project, or that you just want
to ask, please create an issue at the GitHub issues page.
If you do so, please:
Explain in detail what you are requesting.
Keep the scope as narrow as possible, to make it easier to implement or respond.
Remember that this is a volunteer-driven project and that the maintainers will attend every
request as soon as possible, but that in some cases this might take some time.
SDV could always use more documentation, whether as part of the official SDV
docs, in docstrings, or even on the web in blog posts, articles, and such, so feel free to
contribute any changes that you deem necessary, from fixing a simple typo, to writing whole
new pages of documentation.
Obviously, the main element in the SDV library is the code.
If you are willing to contribute to it, please head for the next sections for detailed guidelines
about how to do so.
Ready to contribute? Here’s how to set up SDV for local development.
Fork the SDV repo on GitHub.
Clone your fork locally:
$ git clone email@example.com:your_name_here/SDV.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed,
this is how you set up your fork for local development:
$ mkvirtualenv SDV
$ cd SDV/
$ make install-develop
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Try to use the naming scheme of prefixing your branch with gh-X where X is
the associated issue, such as gh-3-fix-foo-bug. And if you are not
developing on your own fork, further prefix the branch with your GitHub
username, like githubusername/gh-3-fix-foo-bug.
Now you can make your changes locally.
While hacking your changes, make sure to cover all your developments with the required
unit tests, and that none of the old tests fail as a consequence of your changes.
For this, make sure to run the tests suite and check the code coverage:
$ make lint # Check code styling
$ make test # Run the tests
$ make coverage # Get the coverage report
When you’re done making changes, check that your changes pass all the styling checks and
tests, including other Python supported versions, using:
$ make test-all
Make also sure to include the necessary documentation in the code as docstrings following
the Google docstrings style.
If you want to view how your documentation will look like when it is published, you can
generate and view the docs with this command:
$ make view-docs
Commit your changes and push your branch to GitHub:
$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Before you submit a pull request, check that it meets these guidelines:
It resolves an open GitHub Issue and contains its reference in the title or
the comment. If there is no associated issue, feel free to create one.
Whenever possible, it resolves only one issue. If your PR resolves more than
one issue, try to split it in more than one pull request.
The pull request should include unit tests that cover all the changed code
If the pull request adds functionality, the docs should be updated. Put
your new functionality into a function with a docstring, and add the
feature to the documentation in an appropriate place.
The pull request should work for all the supported Python versions. Check the Travis Build
Status page and make sure that all the checks pass.
All the Unit Tests should comply with the following requirements:
Unit Tests should be based only in unittest and pytest modules.
The tests that cover a module called sdv/path/to/a_module.py
should be implemented in a separated module called
Note that the module name has the test_ prefix and is located in a path similar
to the one of the tested module, just inside the tests folder.
Each method of the tested module should have at least one associated test method, and
each test method should cover only one use case or scenario.
Test case methods should start with the test_ prefix and have descriptive names
that indicate which scenario they cover.
Names such as test_some_methed_input_none, test_some_method_value_error or
test_some_method_timeout are right, but names like test_some_method_1,
some_method or test_error are not.
Each test should validate only what the code of the method being tested does, and not
cover the behavior of any third party package or tool being used, which is assumed to
work properly as far as it is being passed the right values.
Any third party tool that may have any kind of random behavior, such as some Machine
Learning models, databases or Web APIs, will be mocked using the mock library, and
the only thing that will be tested is that our code passes the right values to them.
Unit tests should not use anything from outside the test and the code being tested. This
includes not reading or writing to any file system or database, which will be properly
To run a subset of tests:
$ python -m pytest tests.test_sdv
$ python -m pytest -k 'foo'
The process of releasing a new version involves several steps combining both git and
bumpversion which, briefly:
Merge what is in master branch into stable branch.
Update the version in setup.cfg, sdv/__init__.py and
Create a new git tag pointing at the corresponding commit in stable branch.
Merge the new commit from stable into master.
Update the version in setup.cfg and sdv/__init__.py
to open the next development iteration.
Before starting the process, make sure that HISTORY.md has been updated with a new
entry that explains the changes that will be included in the new version.
Normally this is just a list of the Pull Requests that have been merged to master
since the last release.
Once this is done, run of the following commands:
If you are releasing a patch version:
If you are releasing a minor version:
If you are releasing a major version:
Sometimes it is necessary or convenient to upload a release candidate to PyPi as a pre-release,
in order to make some of the new features available for testing on other projects before they
are included in an actual full-blown release.
In order to perform such an action, you can execute:
This will perform the following actions:
Build and upload the current version to PyPi as a pre-release, with the format X.Y.Z.devN
Bump the current version to the next release candidate, X.Y.Z.dev(N+1)
After this is done, the new pre-release can be installed by including the dev section in the
dependency specification, either in setup.py:
install_requires = [
or in command line:
pip install 'sdv>=X.Y.Z.dev'