Setting up test data efficiently for large sets of integration tests: per test or not?



  • With huge sets of integration tests, data setup is becoming somewhat of an obstacle in terms of runtime.

    What is your experience to tackle similar situations?

    Scenario: 200 API integration tests concerning contracts, for which data prerequisites include the existence of a company and a person, each in a specific (non-default) state.

    Case 1: each test sets up its own data

    • Pro: test data is managed within test, no interference with other data
    • Con: creating new companies/people for every test greatly slows down total runtime

    Case 2: data setup is done mostly on test project level

    • Con: test data is managed across all tests in the project, making it harder to manage and prevent cross-influences
    • Pro: creating reusable companies/people the data setup is much smaller, leading to way faster runtimes (tried and tested...)

    Case 3: data setup is done using database restore/snapshots

    • Con: test data is managed across all tests in the project, making it harder to manage and prevent cross-influences
    • Pro: creating reusable companies/people the data setup is much smaller, leading to way faster runtimes
    • Con: running and debugging individual tests against a deployed environment will be very difficult and time-consuming if the entire database needs to be restored


  • If multiple tests can use the same datasets (from your question it seems they can), then I think Case 3 is more efficient, because you will spend less time overall creating the test datasets (not loading, but creating (ie, writing the SQL scripts)). You will create it once and use it for many tests.

    It will be a "Pro" (not a "Con") to manage it at project level: you will have the visibility of all test cases and what they all require, so you can see patterns and similarities, and assign the same test dataset to test cases with similar test data requirements.

    You should, of course, have some document that says which dataset is used by which tests. And when writing your test cases, you should have a "test data requirements" section where you list what kind of data is required for the test (example: "have 2 companies in the DB, one with 4 people and one with 5 people").

    You can also start loading the datasets before your test execution begins (if that is possible on your environment and budget). That will save you some "test preparation" time during test execution. You can also use docker containers (like the other answer said) to store these datasets+DBs to speed up your load time.



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2