Better way to store panel data in SQL or NoSQL
Demir last edited by
I have an Excel of raw panel data (cross sectional + time series) sent to me by customers (I cannot change this). For those not familiar, imagine the gdp time series for each country. I need to transfer this data into a database and I am about to choose the best technology, either a SQL (Microsoft SQL Server), or No SQL (MongoDB).
In principle I am quite sure you can use both, but probably an SQL-like is less suitable. My comments:
SQL: I would need to replicate the name of the country as many times as the timestamp avalilable for its gdp, and stacking each country one below the other. When I need to update (because every 3-months a new gdp value comes out), I would need to code complex queries in order to correctly insert the new value where a country ends but before the next one starts (FYI: to insert data into the db I will have a python code which push new data into the db);
NoSQL: here I can have a collection named GDP, and documents related to each country, so for the document "UK", I'd have a key called (time-series) and a value like [[1-1-2022, 2.3], [1-4-2022, 3.1], ..] (list of list where the first is the timestampt and the second value the gdp) and so on;
in addition since this data are quite messy it can happen that from time to time some data is missing so a NoSQL simply will not have a value, which is less problematic than a SQL. To me, NoSQL is better in this case.
Since in this forum there are expert people (I am not..), I would like to know if my reasoning is right, and for panel data a NoSQL (es. MongoDB, because is free..) type is more suitable.
I agree with the comments that for the use cases you described, it sounds like you'll likely end up needing to do at least the same amount of work (if not more) manipulating and maintaining the data in a NoSQL solution as you would in a relational database management system.
The advantage of a NoSQL solution is usually for the use case where you have a very loosely defined / highly changing schema, and the amount of work to maintain those changes in the database layer outweighs the benefits of maintaining them there. A good use case is especially if you don't need to do any transformations / manipulations to the data, and you just want to house it exactly as it comes in, leaving the work to some other consumer to handle structuring it, such as end users whom are self sufficient and are reporting directly off of it (e.g. a PowerBI scenario perhaps).
For most people this isn't the case, rather they usually work directly or close with the consumer that needs to maintain some sort of defined structure and / or they want to do additional transformations/ manipulations to the data in the database layer, such as your case sounds.
So for you, it really comes down to which system you feel more comfortable maintaining, as both will handle your goals equally and both will require some level of work to maintain. Objectively, they're pretty equal systems otherwise.
For further reading on the benefits and drawbacks of one system over the other, please see this https://dba.stackexchange.com/a/286082/150011 I wrote on a somewhat relevant DBA.StackExchange question.