E
Under no conditions will MySQL generate these 6 characters: \u00e9 from a single character.
', also stores é as \u00e9
I think that happened in your client, not MySQL.
So that column A and column B now use the exact same character set + collation
There is no problem with different columns having diff charset and/or collation. A performance problem can occur when you compare columns with different collations (especially in JOIN...ON).
l'\u00e9l\u00e9phant, you don't get l'éléphant, but l'éléphant
That is inconclusive. Note that display products, especially HTML, will "clean up" things for you. To really see what is in the table, use SELECT HEX(col)...
' -- "HTML entity"
\u00e9 -- "unicode" encoding
l'éléphant encoded in UTF-8 and displayed in hex (with spaces added to separate characters):
Double encoding: 6C 27 C383C2A9 6C C383C2A9 70 68 61 6E 74
UTF-8 encoding: 6C 27 C3A9 6C C3A9 70 68 61 6E 74
latin1 encoding: 6C 27 E9 6C E9 70 68 61 6E 74
text: l ' é l é p h a n t
I'm getting the data in PHP, and ...
But where is the data coming from? mysqli_set_charset is stating that it is utf8mb4-encoded, but is it really?
code a search and replace
If you rush into this, you could be making things worse. First let's find out what is really there, where it came from, etc.
I'm
That is proper in either of these string literals:
'I\'m'
"I\'m"
The language (PHP/MySQL/etc) will remove the backslash as it parses the string. But it is 'wrong' in other contexts.
it escapes the single quote
What escapes it?? prepare+execute? real_escape? addslashes? Something else? As implied above, you do need to escape it. But we need to know what did the escaping -- to avoid messing things up further.
Even if the data is stored as Hello, I'm James ...
You should not let it store that way. That just adds to the confusion later. Ditto for ' and \u00e9. The MySQL table should contain l'éléphant. I repeat, the only way to see if that is what it stored is via SELECT HEX(col) .... And expect "6C 27 C3A9 6C C3A9 70 68 61 6E 74" (minus spaces).
A test:
mysql> INSERT INTO try_json (j) VALUES ('["I\'m"]');
mysql> INSERT INTO try_json (j) VALUES ('["l\'éléphant"]');
mysql> SELECT j, HEX(j), JSON_EXTRACT(j, '$[0]'), HEX(JSON_EXTRACT(j, '$[0]')) FROM try_json;
+------------------+----------------------------------+-------------------------+------------------------------+
| j | HEX(j) | JSON_EXTRACT(j, '$[0]') | HEX(JSON_EXTRACT(j, '$[0]')) |
+------------------+----------------------------------+-------------------------+------------------------------+
| ["I'm"] | 5B2249276D225D | "I'm" | 2249276D22 |
| ["l'éléphant"] | 5B226C27C3A96CC3A97068616E74225D | "l'éléphant" | 226C27C3A96CC3A97068616E7422 |
+------------------+----------------------------------+-------------------------+------------------------------+
Usually you want this; without it, you bet the \unnnn codes:
json_encode($a, JSON_UNESCAPED_UNICODE)
Use urlencode() when you are going to put the string in a URL. That may be where %7C comes from.
PHP's htmlentities() can generate things like < and é. That last one is equivalent to '
In MySQL 8.0, you may need this technique:
select cast(unhex('224D6173746572262333393B7322') as char);
which yields "Master's" (including the quotes).
PHP and its output:
echo "";
$s = '"Master's"'; // with html entity
echo strlen($s), ' ', $s, ' ', bin2hex($s), " s - with html entity \n";
$t = '"Master's"'; // backslash and apostrophe
echo strlen($t), ' ', $t, ' ', bin2hex($t), " t - with backslash and apostrophe \n";
echo "";
14 "Master's" 224d6173746572262333393b7322 s - with html entity
10 "Master's" 224d6173746572277322 t - with backslash and apostrophe