MySQL Innodb, why does the size of primary index depend upon the order of insertion of records



  • I have a table with this schema

    CREATE TABLE `UserIdPhoneNo` (
    `userId` int NOT NULL,
    `phoneNo` bigint NOT NULL,
     PRIMARY KEY (`userId`,`phoneNo`)
     ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
    

    I am planning to store 1 Billion rows in this table. Queries are mostly reads and rarely writes. To make the reads faster, I am going to provide enough memory to innodb_buffer_pool so that entire index fits in-memory. I created a data set i.e. 1 Billion records in increasing order like this

    1,263
    1,264
    1,265
    2,266
    2,267
    2,268
    

    I loaded this data into this table. Index size was 30 GB. Now I shuffled this data using sort command with -R option, inserted this data again (truncated the table before inserting this data) to my surprise the index size was 48 GB. I used this query to find the size of the index

    select database_name, table_name, index_name, stat_value*@@innodb_page_size from mysql.innodb_index_stats where stat_name='size';
    

    Why do we have different size of the index with exact same set of data? Can we do something to fix it?



  • When you insert presorted data into a b-tree index, the leaf pages will tend to be more densely populated, because the index grows only on the trailing end. When you insert randomly ordered data, new rows will be inserted all over the place, potentially causing more page splits, which will end up more sparse as a result.

    Can we do something to fix it?

    There's nothing to "fix" so I would just leave it alone. Your idea that fitting the entire table in memory will materially improve performance is likely wrong. However, if you insist on wasting your time, see suggestions in https://dba.stackexchange.com/questions/110736/mysql-indexes-maintenance , for example.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2