What particular testing requirements do data driven applications have?
Different sorts of software need different approaches to testing. An interactive GUI-driven application will be tested in a very different way from a piece of middleware, for example. Specifically considering data-oriented projects such as those involved with the development of machine vision or "Data Science" applications: What particular and idiosyncratic test requirements do such applications have that distinguishes them and sets them apart from other classes of software development?
The first two issues that come to mind are scalability and tolerance of faulty data. Data Science frequently implies large data sets. As a tester, you need a sense for expectations about data set sizes and the corresponding performance. Those metrics need to be put in the context of the computing environment of course; what you can do on a laptop running R will not be the same as what you can do with a hundred-machine Hadoop cluster. Data Science will often involve data sets with some amount of bad data as a result of transcription problems, faulty sensors, buggy data collection software, or whatever. Some developers will focus on making the algorithm work with clean data. Your job may be to understand how "real" data might differ from the developer's idealized data.