If there is any acceptable ratio for test size data size compared to production data size?
morde last edited by
I would like to know if there is any acceptable ratio for test size data size compared to production data size? If my prod data is 200TB would a test data size of 20TB be adequate? Does such a consideration exist?
There are several factors to consider when selecting test data. For example, if your 200TB of production data is essentially equivalent, then you could potentially calculate a statistical sample from the total equivalent population. If not you could group your 200TB of production data into smaller equivalent subsets and calculate sample sizes for each subset population of data. However, there could always be outliers and if those aren't correctly identified then you could be stepping over some pretty big holes. Also, you might want to identify any failure indicators such as production data that has been historically problematic. Ultimately, there is no magic formula. Part of the challenge as a tester is to determine the appropriate test data, and the appropriate amount of test data that will provide confidence that the feature is well tested and has a low risk of failure with production data in the wild.