Should I use SQL vs NoSQL for files catalog?



  • I will be implementing a system which will be storing 250 Million files/user. I want to perform list operations from the client side, where the client application will be fetching 100 records at a time.

    What is the record I will be fetching?

    • File_name
    • last modified
    • https URI for that file stored in S3 like storage.

    On the server-side should I use SQL or No-SQL to store this meta data information?

    I was thinking about using an RDBMS with following schema:

    • UserID
    • recordid
    • fileName
    • timestamp
    • URI

    As the fetch query just needs to return 100 records at a time, I was thinking SQL. If we save all the user information in a single row in NoSQL it would take a long time to query next-100 records Or append new files for a particular users.

    Any suggestions? I am new to this, so please let me know if my question is too vague/broad and I can update it for any specific questions.



  • Your use case is of the simplest kind, so you'd likely see very similar performance regardless if you used a RDBMS or NoSQL system. But here are the things you should actually consider when deciding on a database system:

    1. Do you have a well structured schema?

      Answer: Yes, it appears you do when you're able to directly articulate the structure of that schema by specifying the fields and their data types: UserId, RecordId, FileName, Timestamp, URI

    2. Will your schema be changing at a high frequency, more than you'd be able to keep the database entity structure up to date with?

      Answer: I wouldn't think so based on the kind of data you're planning on storing in your files table, but that's for you to decide. NoSQL's flexibility to be schema-less is best for when you have a frequently changing or non-concrete schema, and as a developer you don't want to undertake the responsibility of maintaining the changes to the structure on the database side. But if you're ok with maintaining your database entities if / when the schema changes then an RDBMS will work just fine as well.

    3. Is your data relational?

      Answer: Yes, it sounds like it, especially when you mentioned you have "user information" as well which I assume likely relates to your files table by the UserId field.

    There are other reasons you could consider as well such as cost efficiency, ease of infrastructure maintainability with scaling, sharding vs vertical scaling but these are more granular details that are available to most database systems nowadays (regardless if you pick a SQL or NoSQL solution) which are more complex and outside the general scope of when to choose a SQL or NoSQL solution.

    The aforementioned questions are the main ones I think should be used to determine when to pick a RDBMS vs a NoSQL solution. And it sounds like your use case being that your schema is well defined and relational, a RDBMS would be a good choice.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2