Slow query with multiple INNER JOIN (via table)



  • This is my first question here.

    I have a project written in PHP running on windows server and apache and mysql 8. There is functionality to show news by tags (many-to many).

    So I have three InnoDB tables:

    news (id, title ...) news_tag_assn (news_id, tag_id) tag (id, name)

    I use a query to filter news containing exactly two tags:

        SELECT news.* FROM news 
        INNER JOIN news_tag_assn ON news.id = news_tag_assn.news_id 
        INNER JOIN tag ON news_tag_assn.tag_id = tag.id 
        WHERE  (tag.name IN ('Corona', 'Egypt')) AND (news.category_id <> 14) AND (news.visible=1)
        GROUP BY news.id, news.category_id, news.visible 
        HAVING COUNT(*) = 2 
        ORDER BY news.created DESC, news.updated DESC LIMIT 5
    

    I have created some indexes:

    For news_tag_assn table:

    • idx_news_tag_assn (news_id, tag_id)
    • idx_tag_news_assn (tag_id, news_id)

    For tag table:

    • tag_id_name (id, name)

    For news table:

    • idx_news_visible_tags (id, category_id, visible)
    • idx_cat_visible (category_id, visible)

    After creating indexes query become faster (0.144ms vs > 1 sec.)

    Another issue is counting news for pagination:

        SELECT COUNT(*) FROM (SELECT news.* FROM news 
        INNER JOIN news_tag_assn ON news.id = news_tag_assn.news_id 
        INNER JOIN tag ON news_tag_assn.tag_id = tag.id 
        WHERE  (tag.name IN ('Corona', 'Egypt')) AND (news.category_id <> 14) AND (news.visible=1)
        GROUP BY news.id, news.category_id, news.visible 
        HAVING COUNT(*) = 2 ) c
    

    This query is slow than first in 2 times.

    But i have a doubt that i have made all correctly. Maybe someone can advice something how can I make this query more fast.

    UPDATE

    SHOW CREATE TABLE news:

    CREATE TABLE `news` (
      `id` int NOT NULL AUTO_INCREMENT,
      `title` text NOT NULL,
      `alias` text,
      `short_description` varchar(200) DEFAULT NULL,
      `description` varchar(765) DEFAULT NULL,
      `content` longtext COMMENT,
      `label` varchar(255) DEFAULT NULL,
      `category_id` int NOT NULL COMMENT,
      `thumb` varchar(255) DEFAULT NULL,
      `type` int DEFAULT '1',
      `link` varchar(255) DEFAULT NULL,
      `priority` int DEFAULT '0',
      `creator_id` int DEFAULT NULL,
      `layout` varchar(255) DEFAULT NULL,
      `internal_layout` varchar(255) DEFAULT NULL,
      `created` int DEFAULT NULL,
      `updated` int DEFAULT NULL,
      `visible` int NOT NULL DEFAULT '1',
      `custom_bundles` int DEFAULT '0',
      PRIMARY KEY (`id`,`category_id`,`visible`) USING BTREE,
      KEY `idx_news_visible_for_tags` (`id`,`category_id`,`visible`),
      KEY `idx_cat_visible` (`category_id`,`visible`),
      KEY `idx_date` (`created`,`updated`)
    ) ENGINE=InnoDB AUTO_INCREMENT=22269 DEFAULT CHARSET=utf8mb3
    

    SHOW CREATE TABLE news_tag_assn:

    CREATE TABLE `news_tag_assn` (
      `news_id` int NOT NULL,
      `tag_id` int NOT NULL,
      PRIMARY KEY (`tag_id`,`news_id`),
      KEY `idx_news_tag_assn` (`news_id`,`tag_id`) USING BTREE,
      KEY `idx_tag_news_assn` (`tag_id`,`news_id`) USING BTREE
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
    

    SHOW CREATE TABLE tag:

    CREATE TABLE `tag` (
      `id` int NOT NULL AUTO_INCREMENT,
      `name` varchar(255) NOT NULL,
      `alias` varchar(255) NOT NULL,
      `group_id` int DEFAULT NULL,
      `frequency` int DEFAULT '0',
      `visible` int DEFAULT '0',
      PRIMARY KEY (`id`,`name`) USING BTREE,
      KEY `tag_id` (`id`) USING BTREE,
      KEY `tag_id_name` (`id`,`name`)
    ) ENGINE=InnoDB AUTO_INCREMENT=156 DEFAULT CHARSET=utf8mb3
    


    • Your goal is to page through articles that are tagged with both Corona in Egypt?
    • Please provide SHOW CREATE TABLE, not verbiage. With the SHOW, I can see whether you have the optimal indexes. (Hint: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table )
    • I have found that it is better to simply use the tag string instead of normalizing it to an ID. This would eliminate the need for a many-to-many table, which is part of the performance problem.
    • What table is updated in? I can't optimize a query without knowing what table each column is in.
    • After taking care of those, we can revisit whether the HAVING COUNT(*) = 2 pattern is optimal.
    • You mentioned "pagination", but I don't see OFFSET. Tip: http://mysql.rjweb.org/doc.php/Pagination

    Tip: Since the PRIMARY KEY of a table is an index (and UNIQUE), there is not need to have an index the same as the PK. (Example: idx_news_visible_for_tags.)

    Toss both of the KEYs of tags (but keep the PRIMARY KEY) and add INDEX(name). This may be the best performance improvement.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2