How to put production-like data into version control
morde last edited by
One goal of DevOps is to create reproducible, production-like environments from source-code repositories. To achieve this, I believe it's necessary to also restore all the data that is needed to setup the environment, so that automated & manual tests can be run there.
For example, let's take a CMS product that provides content (labels, images, rich-text) to a user-facing web application. We want to run integrations tests or explorative UI tests against that web application using production-like data from the CMS. Due to the nature of the product, the data itself can be manipulated at any time by users using the UI of the CMS.
There are two main use cases
- Developers create example content or add required label translations for new features
- Content managers publish new content for production, which may include static content like label translations
In a typical setup, there are at least two environments, let's say
production. Due to the use cases described above, the CMS databases where the contents are stored will deviate quite quickly between these environments.
Without taking any measures here, after a while, the two databases will be completely out-of-sync and the
stagingdatabase will not be "production-like" anymore. Additionally, each release of a new feature will require manual steps to add data like label translations into the
productiondatabase, a step that is neither verified nor tested before the release.
So my question is: What is a good way to bring new "baseline" data from a
stagingdatabase into a
productiondatabase without overriding the changes there. And was is a good way to bring back data from
stagingenvironment, to provide production-like data for testing purposes. Ideally, all automated and all part of version-control.
Side Note: If we say "different CMS content doesn't change anything about testing", let's think of another use-case where the data behaves even more like configuration. For example, payment or delivery options that are offered on an e-Commerce site.
it depends a bit what kind of cms you use, most cms offer a way to package data in to packages that you can check in to git and deploy to databases almost like code to webapps.
For example, we work with Sitecore and it lets you use TDS or Unicorn, other big CMS use different tools but mostly the same concept.
The tools let you pick items from your Database and serialise them in to code/text files that are packaged and check in to Git, lets say you have a copy of the database on your Development Workstation, you make some updates to the database and you serialise and package the change with TDS in to a package that you check in to your Development branch.
When you deploy the development branch to your Staging environment the TDS package is applied and makes the same updates to the database of your Staging environments.