hierarchical data best practice in relational database



  • we have 1 table that also references itself in parent_id

    table high level structure: (let's assume that data is above 1M rows)

     category_table
     id int
     name varchar
     parent_id int
    

    the goal is to get the categories with their parent category names.

    i have 2 approach in mind

    A) do a select query and join category_table to itself

     select a.*, b.name from category_table a left join category_table b on a.id=b.parent_id
    

    B) add another column for the parent name = parent_name

      category_table
      id int
      name varchar
      parent_id int
      parent_name varchar
    

    then do a single select statement to fetch the data

     select * from category_table
    

    my worries are, A is what i think the right approach is but when fetching data it may be slow. B is faster than A however i will introduce a redundant column and will be a challenge when i update the real parent name

    would like to know your thoughts on which one is the best way to approach this and if you have any other suggestions



  • As with all denormalization decisions, it boils down to two things:

    1. How important is it to optimize the query performance? If the query with the JOIN provides satisfactory performance, then use that, because the database will enforce its own data integrity to prevent update anomalies. However, if the difference is performance is great enough that you definitely cannot satisfy your performance objective if you use the JOIN solution, then you must use the denormalization.

    2. Do you have a robust plan for accounting for the data anomalies? You correctly identified that you could have a challenge if category names change, because you need to remember to update them in multiple rows. How confident are you that you can write the code perfectly to do that? Alternatively, do you have a plan for detecting and correcting anomalies that occur if updates fail to update the category name in all necessary places? Do the updates and the correction code account for concurrent updates?


Log in to reply
 

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2