Best way for archiving old mysql records
we have a Ruby on Rails app with a MySQL DB (on AWS RDS). Some of our tables have a large number of records ~100 million and this number is rapidly increasing every month. Most of the records in these tables are inactive and are only accessed occasionally for some reports that users can access.
We are now facing several issues:
- query time on these tables has increased significantly even with index optimisation
- restoring from a backup in case of an emergency would take a very long time due to the large size of the tables
Our requirements are:
- data needs to be persisted indefinitely somewhere
- (fast) read access to data
- no further write access necessary after a certain period has passed since the creation of a record
- low cost
We were therefore planning on doing the following:
- after a record has gone "inactive" (this is largely determined by a period of time that has passed since creation), we would create a record in DynamoDB mirroring the data in our sql db.
- after several days have passed we then delete the sql record from the mysql db
- the record can then still be read through dynamodb
Our question was if this sort of approach is conventional and if there are potentially better ways of achieving what we want. Any input would be much appreciated.
morde last edited by
DynamoDB can be a good fit for this. Set your table class to "Standard - Infrequent Access" since you plan to have a lot of data relative to the reads/writes. You'll get very fast read access (retrievals in milliseconds), with On Demand mode you wouldn't even have to pay unless you were doing retrievals, and you can set it and forget it.
Is DynamoDB the best? It depends on details you haven't quantified. If by "fast" reads you mean a second or two then you have more choices. Does "low cost" mean you'd rather run your own servers to save a few bucks? Then again, you have more choices.
With Standard-IA prices of 10c/GB and if each item is about 1KB then you're looking at 100,000,000 * 1KB = 100 GB = $10/mo for storage.
If you read at a rate of 10 retrievals a second that's 10 RCU * $0.00013/RCU-hr = $0.94/mo for reads.
No cost for writes after the initial load (which if each item is 1 KB would be about $18 for 100m items using Provisioned mode).