Database design: Combining good schema design with good user experience



  • Context

    I am designing a table schema for a solution that will track "punishments" issued by multiple clients. Each client represents a group of people of various sizes. We will be using PostgreSQL and according to our current numbers I do not expect this table to exceed 500k-1mil rows (of course, it would be nice to be able to scale this up if we suddenly take off and get a lot more clients).

    The Problem

    Normally, I would just store the punishments with an auto generated sequence as the primary key. However, we would like to share the punishment IDs with the clients. In the ideal case, each client ID has its own sequential punishment ID, so that when they open the dashboard they are greeted with a monotonically increasing sequence of punishment IDs corresponding to punishments that they issued. If I were to use an auto incremented sequence for every row, it could be that a client sees IDs e.g. 1, 4, 8, 12, 13, 32. However, this is not human friendly to the client and our clients would prefer that, if we use a number as an ID, this would increase and not make it appear like there are "gap".

    For further context, this is how a client would interact with punishments:

    • They can query all punishments they have issued.
    • For a specific punishment ID, they can query all the details, some of which are omitted in the general query above.
    • Given a specific person, they must be able to see all punishments issued to that person (within the client's group, not globally).
    • They may be able to issue a new punishment, which would ideally generate a coherent punishment ID safely (i.e. no race conditions). Additionally, the client-friendly ID must be returned at the end of the query so it can be displayed to them (think "Created punishment #53").

    Example table of an ideal solution:

    id group_id (client) punishment foo bar
    1 1 1 foo1 bar1
    2 1 2 foo2 bar2
    3 2 1 fizz1 buzz1
    4 3 1 p np

    Here, querying for client 1's punishments would give the ones with punishment 1 and 2. Client 2 looking up foo for punishment 1 would return fizz. The next punishment ID for clients 1, 2, 3 would be 3, 2, 2 respectively.

    Solution 1: Tables/Views

    A very naive solution would be to make a table per client, and then to just use the correct table. However, there are thousands of clients and I feel like this may not be a particularly great solution (mostly since I have not heard of it in university nor in the field).

    Another solution I had was to create views. Perhaps a view per client? I admit I'm not particularly familiar with views, are there any sort of restrictions you should put on creating views? https://www.postgresqltutorial.com/postgresql-updatable-views/ looks like something that could be useful. Here, I would then make the ID displayed to the client the row number when querying the particular view. A concern I have though is that when inserting a new punishment, determining the row number could be quite tricky to do efficiently, especially after https://wiki.postgresql.org/wiki/Count_estimate .

    Solution 2: Auxiliary Table

    I would essentially keep a table of the last known punishment IDs per client and then do something like:

    start transaction
    upsert punishment ID for client, if fail, abort transaction
    insert into punishments with newly created ID, if fail, abort transaction
    commit transaction
    

    Is this a good practice though?

    Solution 3: Non-integer IDs

    Something else I had in mind was to use non-integer IDs (think Git commit hashes). An automatic alphanumeric string would work for this. However, I would want to keep these IDs short (preferably not exceed 6 characters) and for my RDBMS to manage collisions for me (if this is possible, if not, I can also do it myself). In this case, the clients would just see some random letter/number spam and not panic because of gaps in the sequence. I know that UUIDs are a thing and I have used them before, but in this context they are not client friendly enough to be used.

    Question

    What is the "best" solution here? Or even better, what are the pros/cons of each? How can I decide on the best tradeoff between me the developer&dba who wants the database to perform well and the client who wants their output to be idiot friendly?



  • Let's tackle some of the pros and cons of each these proposed solutions in reverse:

    Solution 3: Non-integer IDs: Not a bad idea, except the human readability / idiot-friendliness goes out the window pretty quickly, especially when you care about collisions. And if you have a limit of 6 characters (to improve the readability), you're going to have a bad time with managing the collisions . The pro of this solution such as using a pseudo-random data type such as GUID is there's a level of improved privacy / security by decreasing the idiot-friendliness of your IDs.

    Solution 2: Auxiliary Table: Not a terrible idea, but if I understand you correctly, you're saying you'd use a separate table to specifically store the last PunishmentId per Client?...I think that would be unnecessarily redundant data that could be more liable to worse data integrity. Rather you could just use your Punishments table to tell you what the max PunishmentId is for the Client you're creating a new Punishment for. With proper indexing, and the cardinality of the number of Punishments per Client likely not being a lot relative to the entire table size, it should be fast enough to lookup against your Punishments table when you need to get the max PunishmentId per Client.

    Solution 1: Tables/Views (one table per client): Yea that's going to be unnecessarily difficult to manage for the number of clients you have for what sounds like should be a fairly simple system. Sometimes multi-database to client pattern is used as a solution for decoupling large amounts of data, especially in cases where the client owns the data and / or wants it to live on a separate server from the other databases. But I don't think your system sounds like it necessarily needs that additional complication for the problem you're trying to solve. It also adds additional complication for querying appropriately from the application layer or if you wanted to report off the data ever.

    Solution 0: What you could do: You could leverage a view on top of your Punishments table with a https://www.postgresql.org/docs/14/functions-window.html such as https://www.postgresqltutorial.com/postgresql-row_number/ that generates a user-friendly fake ID for just display purposes but is backed by the actual primary key of that row. In this case you could use your globally increasing auto increment identity column for the PunishmentId but your window function would partition by the group_id and order by PunishmentId. This would ensure the fake user-friendly ID is deterministic and sequential for that individual client. Here's some example code you could use to create a view around to accomplish that:

    CREATE VIEW PunishmentsDisplay AS
    

    SELECT id, group_id, punishment, foo, bar,
    ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY id) AS FakePunishmentId
    FROM Punishments;

    Then you can just select from your view for display purposes like so:

    SELECT id, FakePunishmentId, group_id, punishment, foo, bar
    FROM PunishmentsDisplay
    WHERE group_id = 123;
    



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2