What is most efficient design pattern for a sql database containing billions of rows per user in a single table?



  • I work on a relatively large system where have started to run into performance problems scaling for multiple users.

    The system is a .NET application, so query's are written using an ORM (entity framework), and the database is an Azure SQL database.

    I'm a developer and not a DBA; Typically when we've hit performance limits, and have optimised our queries to the best of our ability, but if we are still throttling the database, I scale up to a higher tier to increase our DTUs and the problem is solved.

    We're now at a point where it would be cheaper to give individual users their own database, rather than scale any further.

    I wont go into the details of what we do, but essentially we have a constant stream of data being sent from our users which on average is writing about 100,000 rows of data per user, per day, to the same table. Our users need quick access to this data, which typically involves loading in one month to a year of data at a time.

    My question is - In this scenario, what options do I have to maintain our performance.

    As far as I can tell, my only options are:

    1 - Generate each user their own table within the database (if that's even possible), so I only need to deal with a few billion rows per user when querying (35b per year).

    2 - I generate each user their own database (which should help with the performance hit from concurrent queries, but would be a nightmare to manage)

    3 - I just keep throwing more money at azure until it becomes technically impossible to scale any further?

    Thanks.



  • I like your question and I will try to find a solution for you.

    Your scenario is where Entity Framework and others ORM brings applications like yours. "Just write code, don't think about the DB"... till you have to think about the DB.

    As said in the comment hiring a DBA is the next step.

    Anyway, let's look at the solutions we might approach:

    Multi Tenant Architecture (All customers on the same DB)

    • Azure Hyperscale: is good for Multi Tenant architecture (which is your case, all customers on the same database) but you need to understand if is going to be more or less expensive than Azure SQL Database. Keep in mind that once you scale to https://docs.microsoft.com/en-us/azure/azure-sql/database/service-tier-hyperscale you cannot switch back to Azure SQL Database.
    • Cosmos DB: You are already using a ORM, right? The idea was not caring about the DB anyway. Let's see if you can https://docs.microsoft.com/en-us/azure/data-factory/how-to-sqldb-to-cosmosdb and move your Azure SQL Database to Cosmos DB which is tailored for heavy load.

    Single Tenant Architecture (One DB per customer)

    • Azure SQL Database: you are already on that option and when you scale up the problem is solved. However DTUs are expensive and the problem is solved by burning money. Try to extract a few customers and simulate the load and see if it's let's expensive.
    • Azure SQL Edge: "[...] writing about 100,000 rows of data per user, per day" that sounds like an IoT to me so why not approaching https://azure.microsoft.com/en-us/products/azure-sql/edge/#product-overview which is made for data streaming. Here too you have to test it before you move completely.

    And now the reply to each of your questions:

    1. Generate each user their own table within the database: Please don't do it. the database will still be under pressure and you will not solve the problem.
    2. I generate each user their own database: That is the solution for me, moving from Multi Tenant to Single Tenant: Divide et Impera. And is not that hard to do if you are already on Azure SQL Database. If you have Customer1, Customer2 and Customer3 you just need to https://docs.microsoft.com/en-us/azure/azure-sql/database/database-copy?tabs=azure-powershell that you already have and if you point the same .NET application to each of the database the application will work. You then need to DELETE all the data related to the other customers from each database and, voilà, you passed from Multi Tenant to Single Tenant architecture.
    3. I just keep throwing more money at azure: This is impractical.

    So the solution for me is moving Multi Tenant to Single Tenant architecture and decide between Azure SQL Database or Azure SQL Edge.

    But please, hire a DBA.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2