Lists and list of items instead of one table per 'list'?



  • A sql database I am working on at a customer has been created with an unusual structure.

    For example, instead of having one table for each of the following:

    • Country,

    • State,

    • City,

    • Units (in, m, miles per hour),

    they only have two tables:

    1. One for the list id (fields: list index and list name). Ex of records: City, Units
    2. One for the items in a given list (fields : item index, description of the listitem and a fk referring the related list index stated above). Ex of records: km/h, Atlanta, miles, Paris

    The sequences for ids in the listitem table are organized as such that there are gaps to somewhat separate the items from different groups in the listitem, in case one need to add a new item in the list:

    • Seq for Unit may start at 20000

    • Seq for City may start at 30000

    So if 10 new Units need to be added , the index could stay grouped with other already created Units and without overflowing in other items like city. Application layer deals with the sequence range for each groups of items.

    They have 30+ such lists in the list table and thousands of item in the item table each relating the a list id.

    I think this is non sense for multiple reasons. But they think it is efficient because 'we have just 2 tables instead of 30'. They also think it is novel and a somewhat typical way to work. I understand this can work, technically, but I anticipate unnecessary and multiple problems with the implementation of these 2 tables.

    My question: is it typical or an accepted practice to do something like the above?



  • This is a known pattern, but not a good one, and not novel. While the application can be made to run this way you miss out on a lot of benefits.

    First, foreign keys. It is not possible to declare a reference to the list table "but only IDs in the 10,000 range." So proper referential integrity is gone. There's nothing to stop a product weighing 10 London's and being coloured Frankfurt.

    Second, data types. All the list values data types must be the lowest common denominator. In practice this means Unicode. Again, there's nothing to stop a vehicle's passenger count being one of {1, 2, 3, Tuesday}.

    Similarly the length must be the maximum imaginable not just the longest known. This has overheads in storage, memory consumption etc.

    Third, you may end up with only one table but there are still 30+ entity types shoehorned into that table. These have not gone away. This adds an unnecessary cognitive load. Instead of looking for a city in a table called Cities you have to remember it is in a table called Lists, between 10,000 & 15,000 (or was that 20,000 to 30,000?).

    Fourth, joins lose self-documentation. Instead of

    ... Data as d
    inner join Cities as c on c.CityId = d.CityId
    inner join Units as u on u.UnitId = d.UnitId
    

    we instead have

    ... Data as d
    inner join Lists as l1 on l1.ListId = d.CityId
    inner join Lists as l2 on l2.ListId = d.UnitId
    

    I know which I prefer.

    Fifth, with separate tables each table's key could likely be a 1 byte integer for 256 values. When altogether I'll bet it will be a 4 byte int. For sure not a whole bunch of bytes overall. But why get worked up about reducing the table count and not the byte count?

    I'll stop here and end with this observation. No DBMS worthy of the name will care two hoots about 30 fewer tables in its catalog. It will make absolutely zero difference to runtimes but will haunt maintenance for years to come.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2