how to properly index a varchar column that won't use LIKE



  • I've been experimenting with EXPLAIN and noticed that the row count was big for our query so I tried adding an index to the VARCHAR column. First I added FULLTEXT but it ended up just making the query longer so I checked it with EXPLAIN and it was still showing all the rows. but when I tried to do EXPLAIN on the same table but using another indexed column which is an INT the row count was only 1.

    I also tried changing from FULLTEXT to just a normal INDEX but it still showed the same result.

    Will VARCHAR always result in a high row count even if you INDEX it? Is there any way to improve its performance?

    Also, the column is not unique.

    Query Time Before FULLTEXT:

    'query' => 'select `id` from `pdf_packages` where `private_reference_number` = ? and `pdf_packages`.`deleted_at` is null limit 1',
        'bindings' => 
        array (
          0 => '558376067',
        ),
        'time' => 64.42,
    

    Query Time After FULLTEXT:

    'query' => 'select `id` from `pdf_packages` where `private_reference_number` = ? and `pdf_packages`.`deleted_at` is null limit 1',
        'bindings' => 
        array (
          0 => '292314725',
        ),
        'time' => 129.89,
    

    EXPLAIN WITH FULLTEXT:

    enter image description here

    EXPLAIN WITH INDEX:

    enter image description here

    EXPLAIN WITH INDEX INT COLUMN:

    enter image description here

    I've already removed the INDEX for the column "private_reference_number" so you won't see it below.

    SHOW CREATE TABLE:

    CREATE TABLE `pdf_packages` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `organiser_id` int(11) NOT NULL,
      `pro_event_id` int(11) NOT NULL,
      `pro_order_id` int(11) NOT NULL,
      `pro_order_item_id` int(11) NOT NULL,
      `package_id` int(11) NOT NULL,
      `package_item_item_id` int(11) NOT NULL,
      `private_reference_number` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
      `reference_index` int(11) NOT NULL,
      `pro_attendee_id` int(11) DEFAULT NULL,
      `member_card_id` int(11) DEFAULT NULL,
      `created_at` timestamp NULL DEFAULT NULL,
      `updated_at` timestamp NULL DEFAULT NULL,
      `deleted_at` timestamp NULL DEFAULT NULL,
      `is_printed` int(11) NOT NULL DEFAULT '0',
      `has_arrived` tinyint(4) NOT NULL DEFAULT '0',
      `arrival_time` datetime DEFAULT NULL,
      `seat_details` longtext COLLATE utf8_unicode_ci,
      `svg_seat_details` longtext COLLATE utf8_unicode_ci,
      `is_shared` tinyint(1) NOT NULL DEFAULT '0',
      `share_from_id` int(11) DEFAULT NULL,
      `owner_id` int(11) DEFAULT NULL,
      `gates` text COLLATE utf8_unicode_ci,
      `pro_order_edit_history_id` int(11) DEFAULT NULL,
      `hard_ticket_sent` tinyint(4) NOT NULL DEFAULT '0',
      `hard_ticket_sent_on` datetime DEFAULT NULL,
      `sent_by` int(11) DEFAULT NULL,
      `pdf_sent` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
      `from_access` tinyint(1) NOT NULL DEFAULT '0',
      `pro_promo_code_id` int(11) DEFAULT NULL,
      `printed_at` datetime DEFAULT NULL,
      `bundle_index` int(11) NOT NULL DEFAULT '0',
      PRIMARY KEY (`id`),
      KEY `pdf_packages_organiser_id_pro_event_id_pro_order_id_index` (`organiser_id`,`pro_event_id`,`pro_order_id`),
      KEY `pdf_packages_pro_order_item_id_package_id_index` (`pro_order_item_id`,`package_id`),
      KEY `pdf_packages_package_item_item_id_pro_attendee_id_index` (`package_item_item_id`,`pro_attendee_id`),
      KEY `pdf_packages_member_card_id_share_from_id_index` (`member_card_id`,`share_from_id`),
      KEY `pdf_packages_pro_order_edit_history_id_index` (`pro_order_edit_history_id`),
      KEY `pdf_packages_pro_attendee_id_index` (`pro_attendee_id`),
      KEY `pdf_packages_pro_order_item_id_index` (`pro_order_item_id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=184980 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 
    


  • Given your query, I would add this index:

    ALTER TABLE pdf_packages ADD INDEX (private_reference_number, deleted_at);
    

    That matches the two conditions you have in your WHERE clause, so it should do the best job at reducing the number of examined rows. An index can help searching a VARCHAR column as well as an INT column.

    However, if your query searches for a value that happens to occur on most of the rows in the table, the optimizer may reason that using the index isn't going to help, so it just does a table-scan.

    By analogy, this is the reason why common words aren't included in the index at the back of a book. Why bother to list the word "the" in the index, followed by a list of every page number? It's quicker to just read the book cover to cover.

    In practice, MySQL's optimizer skips the index if the value you search for occurs on about 20% of the rows or more. This is not a documented threshold, just something I've noticed.

    If you think the optimizer has made a bad choice, you can override it by using an https://dev.mysql.com/doc/refman/8.0/en/index-hints.html . But most of the time, it makes the right choice.

    A FULLTEXT index will be used only if you use the MATCH() AGAINST() predicate. I don't see that in your query, so you don't need a FULLTEXT index.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2