Is it OK to use the classes under test to initialize the database for the tests?
I am working at a small in-house dev department in a non-IT organisation. Me and another junior dev are creating a mid-sized CRUD web application in ASP MVC. There is no formal test process in our department. I am trying to gently introduce some best practices including unit testing, of which I only have theoretical knowledge myself.
We started with creating a nice, clean database schema. During the requirements analysis and database modelling phases, I went through piles of real data from our domain and then entered them all per hand in the database. Then the other dev got the task of creating the Model part of the project, writing the entity and repository classes needed for Linq to SQL.
I outlined my vision for the unit tests of his classes: We already have a small but consistent set of data we can use as a test data. I suggested that he automatically recreate the database from scratch, filling the tables per SQL script, and then let the tests run against the dataset. He agreed and went to work.
Now, a few weeks later, he has for some reason decided that he doesn't like working with the data as it is. He initializes the database by re-creating the schema, and then fills it using our Linq classes and their connections. He does not pay attention to keeping the data in a state which is sensible for the domain. Example: in my data, a mouse could have a mutation expressed in the mammary glands. In his data, a drosophila can have a mutation expressed in the mamary glands, because he just set random connections between records from the proper tables, without caring for business meaning.
I see two problems with this approach.
- I think that the tests supposed to verify the work of our Linq classes should not use the Linq classes for initialization of the data set. From my point of view, this is a bad case of circular reasoning.
- I would like to keep one set of test data which is applicable for most types of test, maybe with a few necessary variations. While there is no reason why the data used for unit testing the database connection should be the same as the data used for tests involving the users, there is also no reason to make it different. And the users cannot work with data which tells them that a drosophila has mammary glands. So, I would like us to adapt our tests to work with the old data set.
My co-worker is unhappy, because my suggestions mean extra work load. He feels that they are completely unnecessary.
My questions for you: First, is his approach wrong (so we should change it), or is it right (maybe one of many possible right approaches)? Second, is my own suggestion right or wrong? Third, if his approach is so wrong we should change it, what arguments can I use to convince him? (He is absolutely no convinced by my "circular reasoning" argument).
Update User246 answered
Of course if you don't implicitly trust your Linq classes, you should test them.
Under which circumstances would I trust my Linq classes? Do I need to test them, or not?
He/she also said:
One way to test the Linq classes would be to write the data using Linq and then read the data using some other means
This is exactly what I was planning to do. My question was: Why? Why cannot I test the Linq classes with themselves? I thought that it is so obviously a bad idea that it doesn't need an explanation. But now that my co-worker does not believe that it is a bad idea, I realised that I have no arguments against it. So, what are the arguments for this view?
Welcome to SQA, Rumi P. It sounds like you have a bootstrap problem rather than a chicken-and-egg problem. There is nothing intrinsically wrong with using software under test to create your test data, especially if it lets you write and maintain tests more easily.
Of course if you don't implicitly trust your Linq classes, you should test them. One way to test the Linq classes would be to write the data using Linq and then read the data using some other means (e.g. hard-coded SQL), and vice versa.
There is an underlying testing principle here that I did not bother to articulate, so here it is: if your system under test communicates with the outside world (e.g. a database or a network connection), and you only test the system using its own operations, all you have verified is that the system is consistent with itself.
Here is an example. Let's say you want to test a class that reads and writes files. It has two operations:
readFiletakes a filename and returns an array of bytes representing the file contents.
writeFiletakes a filename and an array of bytes and creates that file. Now, one way to test this would be to fill up an array of bytes, write to a file using
writeFile, read the file back using
readFile, and verify that the outgoing array is equal to the incoming array. Certainly, if that test fails, you have a problem.
Now consider this implementation of
function writeFile(filename,byteArray): //... write byteArray to file /home/bob/test.txt return function readFile(filename): byteArray = contents of file /home/bob/test.txt return byteArray
writeFileallegedly communicate with an external system -- in this case, the file system -- but can be implemented in such a way that you cannot verify that fact using the system under test's operations. The system is consistent with itself but is not consistent with the outside world.
I hope you can infer how this relates your question (and my answer) about the Linq classes.