Query with EXISTS-subquery taking about 280 ms in 80 % of cases, and less than 1 ms in 20 % of cases



  • I noticed a query that seemed to take way too long most of the time, so I investigated it with the Laravel query logger (to get the time and the actual SQL from the query builder).

    Excerpt:

    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::findAllByNameOrAlias('Frankfurt University of Applied Sciences'); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where (`name` = ? or exists (select * from `company_aliases` where `companies`.`id` = `company_aliases`.`company_id` and `name` = ?)) and `companies`.`deleted_at` is null",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 278.46,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::findAllByNameOrAlias('Frankfurt University of Applied Sciences'); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where (`name` = ? or exists (select * from `company_aliases` where `companies`.`id` = `company_aliases`.`company_id` and `name` = ?)) and `companies`.`deleted_at` is null",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.72,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::findAllByNameOrAlias('Frankfurt University of Applied Sciences'); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where (`name` = ? or exists (select * from `company_aliases` where `companies`.`id` = `company_aliases`.`company_id` and `name` = ?)) and `companies`.`deleted_at` is null",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.67,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::findAllByNameOrAlias('Frankfurt University of Applied Sciences'); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where (`name` = ? or exists (select * from `company_aliases` where `companies`.`id` = `company_aliases`.`company_id` and `name` = ?)) and `companies`.`deleted_at` is null",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 298.88,
         ],
       ]
    

    Notice the query times.

    The use case: I am looking for companies by name, but we want to be able to specify aliases that identify the same company. The companies table has an index on name, and the company_aliases table has an index on name, too. company_aliases.company_id has a foreign key constraint referencing companies.id.

    Cherry on top: The table company_aliases is empty. EXPLAIN for the above query:

    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY companies ALL companies_name_index NULL NULL NULL 241068 Using where
    2 MATERIALIZED company_aliases ref company_aliases_company_id_foreign,company_aliases_name_index company_aliases_name_index 1022 const 1 Using index condition

    Leaving out the company_aliases table and searching in companies only, gives sub 1 ms run times consistently.

    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.84,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.95,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.83,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.73,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.6,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.9,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.86,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.64,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.53,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.47,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.55,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.53,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.66,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.79,
         ],
       ]
    >>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
    => [
         [
           "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
           "bindings" => [
             "Frankfurt University of Applied Sciences",
           ],
           "time" => 0.84,
         ],
       ]
    

    A search in company_aliases is as fast as a search in companies, so it's a viable workaround to query companies and, if no result was found, query company_aliases. The total query time is still below 2 ms. Essentially, I am just moving the lazy evaluation of the OR into my PHP code. But this shouldn't be faster than executing the same logic entirely in the DB.

    What's the reason?



  • "A search in company_aliases is as fast as a search in companies, so it's a viable workaround to query companies and, if no result was found, query company_aliases. The total query time is still below 2 ms. Essentially, I am just moving the lazy evaluation of the OR into my PHP code. But this shouldn't be faster than executing the same logic entirely in the DB." That's not the same logic. The query is saying give me the rows in companies that either have the same name as this value or can be found in company_aliases with this value. There's nothing which tells it to stop when it finds a match.

    You can expand the OR condition manually:

    select * 
    from  `companies` 
    where  `name` = ? 
    and    `companies`.`deleted_at` is null
    union all
    select * 
    from  `companies` 
    where  exists (select null 
                   from   `company_aliases` 
                   where  `companies`.`id` = `company_aliases`.`company_id` 
                   and    `name` = ?
                  )
    and    `companies`.`deleted_at` is null
    and   ( `name` <> ? or `name` is null)
    

    This should give you a plan which uses an index on companies.name to satisfy the first part, and it should use an index on company_aliases.name followed by a loop to companies based on id for the second part. Assuming those indexes exist, you should be fine.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2