Getting the last few rows of a large table with query that is using JOINs



  • I'd like to get the last (or last few) rows of a table (named CONTENT) containing more than 10M rows. The query contains joins on 2 other tables and it is extremely slow. These are the table definitions and my query:

    CREATE TABLE `USER` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `value` varchar(64) DEFAULT NULL,
      PRIMARY KEY (`id`),
      KEY (`value`)
    ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
    

    CREATE TABLE GUID (
    id int(11) unsigned NOT NULL AUTO_INCREMENT,
    value char(36) DEFAULT NULL,
    PRIMARY KEY (id),
    KEY value (value)
    ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;

    CREATE TABLE CONTENT (
    id int(11) unsigned NOT NULL AUTO_INCREMENT,
    user_id int(11) unsigned DEFAULT NULL,
    guid_id int(11) unsigned DEFAULT NULL,
    timestamp datetime DEFAULT NULL,
    PRIMARY KEY (id),
    KEY guid (guid_id),
    KEY user_id (user_id),
    KEY timestamp (timestamp),

    CONSTRAINT CONTENT_ibfk_4 FOREIGN KEY (guid_id) REFERENCES GUID (id) ON DELETE CASCADE ON UPDATE CASCADE,
    CONSTRAINT CONTENT_ibfk_5 FOREIGN KEY (user_id) REFERENCES USER (id) ON DELETE CASCADE ON UPDATE CASCADE
    ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

    Query:

    SELECT 
    `CONTENT`.`id`,
    `GUID`.`value` AS `guid_value`, 
    `USER`.`value` AS `user_value` 
    FROM `CONTENT`, `USER`, `GUID` 
    WHERE `CONTENT`.`user_id` = `USER`.`id` 
    AND `CONTENT`.`guid_id` = `GUID`.`id` 
    ORDER BY `CONTENT`.`timestamp` DESC LIMIT 1
      # even without ORDER BY the query is slow as seen by explain command
    

    These are the results of explain command copied as INSERT:

    +------+-------------+---------+--------+---------------+---------+---------+-----------------------------+-------+----------------------------------------------+
    | id   | select_type | table   | type   | possible_keys | key     | key_len | ref                         | rows  | Extra                                        |
    +------+-------------+---------+--------+---------------+---------+---------+-----------------------------+-------+----------------------------------------------+
    |    1 | SIMPLE      | GUID    | index  | PRIMARY       | value   | 37      | NULL                        | 16329 | Using index; Using temporary; Using filesort |
    |    1 | SIMPLE      | CONTENT | ref    | guid,user_id  | guid    | 5       | MANAGER.GUID.id             | 293   | Using where                                  |
    |    1 | SIMPLE      | USER    | eq_ref | PRIMARY       | PRIMARY | 4       | MANAGER.CONTENT.user_id     | 1     |                                              |
    +------+-------------+---------+--------+---------------+---------+---------+-----------------------------+-------+----------------------------------------------+
    

    The query is unusable so I am splitting it in 2 queries. First, I retrieve the CONTENT.id of interest and second, I plug in an additional WHERE CONTENT.id = x clause in the SELECT statement. It seems that in original query MariaDB optimizer does not understand that I only need 1 row, so it makes the cartesian product with every row in the GUID table. Is splitting the query in 2 subqueries the way to go? Can someone comfirm that cartesian product operation is indeed the operation that is causing problems? (first row of explain command results)

    EXPLAIN of Ricks query:

    +------+-------------+------------+--------+----------------------+-------------------+---------+-----------------------------+---------+-------------+
    | id   | select_type | table      | type   | possible_keys        | key               | key_len | ref                         | rows    | Extra       |
    +------+-------------+------------+--------+----------------------+-------------------+---------+-----------------------------+---------+-------------+
    |    1 | PRIMARY     |  | ALL    | NULL                 | NULL              | NULL    | NULL                        | 2       |             |
    |    1 | PRIMARY     | CONTENT    | eq_ref | PRIMARY,guid,user_id | PRIMARY           | 4       | c.id                        | 1       | Using where |
    |    1 | PRIMARY     | USER       | eq_ref | PRIMARY              | PRIMARY           | 4       | MANAGER.CONTENT.user_id     | 1       |             |
    |    1 | PRIMARY     | GUID       | eq_ref | PRIMARY              | PRIMARY           | 4       | MANAGER.CONTENT.guid_id     | 1       |             |
    |    2 | DERIVED     | CONTENT    | index  | NULL                 | timestamp         | 6       | NULL                        | 9474301 | Using index |
    +------+-------------+------------+--------+----------------------+-------------------+---------+-----------------------------+---------+-------------+
    


  • Please see how fast this runs and what it's Explain says:

    SELECT  content.`id`, `GUID`.`value` AS `guid_value`, `USER`.`value` AS `user_value`
        FROM  ( SELECT id FROM content ORDER BY timestamp DESC LIMIT 1 ) as c
        JOIN content  ON content.id = c.id
        JOIN `USER`  ON user.id = content.user_id
        JOIN `GUID`  ON guid.id = content.guid_id
    



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2