Join table performance



  • This question is more about performance.

    Assume that there are two tables one with millions of records and one with some thousands of records in a DB like MySQL. By this example, I am trying to show a very unbalanced situation. So the question is, how expensive would be the (inner) join operation? (join on for example user_goup from the 1st table and the id from the second). do you think that this architecture can be scaled to more records in tables?



  • It would depend more on the distribution of data in both tables. The total number of records in each table is much less of a factor here. If Table1, which has millions of rows, only had 1 row that matched on the JOIN conditions to Table2 (the table with thousands of rows) and an INNER JOIN was used then only 1 row would be returned in the result set. (This is known as the cardinality - the number of rows that match your predicates.) With proper indexing this should be achievable in milliseconds. This is true for almost any number of total rows in both tables (e.g. even if the table had billions of rows).

    But there are a lot of factors that influence the outcome above which your question leaves out, so that's just one example. If the above tables weren't properly indexed (e.g. on user_goup and on id for each table respectively) then you'd possibly run into a full table scan, which would be a lot slower on the table with millions of records. The type of JOIN being done would also result in a different outcome which could exhibit different performance from my above example, and again depending on how the data is distributed and how much data is actually being returned, would also impact the total runtime.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2