MySQL 8 trailing spaces being evaluated in equals comparison


  • QA Engineer

    I am getting different behaviors after updating a 5.6 database to 8.

    On 5.6, if I run the following statement I get zero rows, which is what I expect to get:

    SELECT * FROM EntityCustomerContact where CustomerContactID <> TRIM(CustomerContactID);
    

    On 8.0 I get hundreds of rows returned. CustomerContactID is a varchar(32);

    My understanding is that all CHAR, VARCHAR, and TEXT values in MySQL are compared without regard to any trailing spaces.

    The update process consisted of importing a mysqldump from 5.6 into 8.0. The only other thing that I changed other than moving to 8.0 was updating the collation on all tables and columns from utf8mb4_unicode_ci to utf8mb4_0900_ai_ci. This is the first upgrade I've done moving to the utf8mb4_0900_ai_ci collation.

    I have performed a number of other upgrades from 5.6/7 to 8.0 and never encountered this issue - I ran my test query on some other 8.0 (specifically 8.0.29) databases that were upgraded in the same fashion excepting the collation change and received the expected results.

    I have searched for possible mysqld settings to address this but have come up short.



  • I've been wondering about this. Perhaps this explains it (from the changelog for 8.0.1):

    ----- 2017-04-10 8.0.1 Development Milestone -- Character Set Support -- -----

    The pad attribute for Unicode 9.0.0 collations was changed from PAD SPACE to NO PAD. Consequently, these collations now treat spaces at the end of strings like any other character. The affected collations have names that contain the string 0900.

    Comparisons of /doc/refman/8.0/en/char.html columns that have a 9.0.0 collation differ from other collations with respect to trailing spaces. For example, 'a' and 'a ' compare as different strings, not the same string. Example:

    mysql> SET NAMES 'latin1' COLLATE 'latin1_swedish_ci';

    mysql> SELECT 'a' = 'a ';
    +------------+
    | 'a' = 'a ' |
    +------------+
    |          1 |
    +------------+
    mysql> SET NAMES 'utf8mb4' COLLATE 'utf8mb4_0900_ai_ci';
    mysql> SELECT 'a' = 'a ';
    +------------+
    | 'a' = 'a ' |
    +------------+
    |          0 |
    +------------+
    

    The INFORMATION_SCHEMA /doc/refman/8.0/en/collations-table.html table now has a PAD_ATTRIBUTE column that indicates the pad attribute for each collation.

    A problem with the latin1_de collation involving early weight string truncation has been corrected. The only likely effect is for /doc/refman/8.0/en/string-functions.html#function_weight-string function results.

    I think the bottom line is to avoid CHAR; stick with the VARCHAR. I don't think COLLATION has much, if anything, to do with the issue; rather "Padding".


Log in to reply
 

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2