Data Testing for Massive CSV/XML files
Some background: I am automating for continuous deployment a massive amount of massive files that are in different data structures that gather different data. I know this is vague but sadly, that is the best I can really explain it. One of the primary goals of the automation is to verify that the data is as accurate as possible. There is approximately 200 different Data Structures all containing different pieces and parts of information designed with different styles, XML, CSV, Pipe Delimited etc etc. I was wondering if anyone had any experience with such a project, and if so, any tools or methods used in order to complete such a task. In order to avoid generalizing and just looking for ideas: Is there a tool that can be used that will assist in parsing multiple different types of data structures (listed above) and be able to associate them with data from another source, preferably on the fly?
using your favorite script language would probably be the best solution, Peter suggested Perl and I'll add Python to the list, both have excellent modules to parse and analyze CSVs and XMLs and a lot of capabilities to help you in your tasks.