Storing special characters (German, French, Spanish) in SQL Server result in weird strings
I'm trying to store the German ß in SQL Server, but it results in ÃŸ. For example "Gleimstraße" is stored as "GleimstraÃŸe"
But this also happens for other characters:
- "König-Karl-Straße" stored as "KÃ¶nig-Karl-StraÃŸe"
- "Quai André-Citroën" stored as "Quai AndrÃ CitroÃ«n"
- "Carrer dels Adreçadors stored as "Carrer dels AdreÃ§adors"
I looked at the database collation and it shows
SQL_Latin1_General_CP1_CI_AS. I then Googled and found https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.SQLServer.CommonDBATasks.Collation.html .
But I have no idea if my existing collation is causing the issue, or what I need to do to be able to store the special characters like ö, ß, é, ë, ç, ñ, í.
Indeed, I'm storing the strings in an
nvarcharcolumn. I send the data to the database via my .NET application. This is what I have in my web.config as the connection string:
I read values from a JSON file which I've passed into a
Newtonsoft.Json.Linq.JArrayand then pass it to a function, like below:
Public Shared Function InsertStreetId(ByVal street As String) As Integer
Dim streetId As Integer
Dim myConnection As SqlConnection = GetConnection() Dim cmd As New SqlCommand("INSERT INTO geo_streets(streetname) VALUES (@streetname)" + ";SELECT CAST(scope_identity() as int);", myConnection) cmd.Parameters.Add(New SqlParameter("@streetname", street)) Try myConnection.Open() streetId = CInt(cmd.ExecuteScalar) Catch ex As Exception Finally myConnection.Close() End Try Return streetId End Function
Somehow the data is being sent over as UTF-8 bytes. Because your database collation is
SQL_Latin1_General_CP1_CI_AS, that uses code page 1252 (Latin1) for
ßcharacter is encoded as bytes C3 9F. In code page 1252, those two bytes equate to the following characters: Ã and Ÿ.
You can see this for yourself by executing the following in SSMS or Visual Studio Code, etc:
SELECT CHAR(0xC3) + CHAR(0x9F);
One option to solve this would be to use a UTF-8 collation (i.e. one ending in
_UTF8) as the database default collation, which you can do since you are using SQL Server 2019.
If you can't, or simply do not want to, change the database default collation to a
_UTF8collation, then you will need to find a way to change the encoding / culture / locale / collation of the connection to match your database code page (i.e. Windows 1252). Fortunately code page 1252 does store those characters (i.e. ö, ß, é, ë, ç, ñ, í ), but if you need to store a greater range of characters, you might need to either switch to a UTF-8 collation (if sticking with
VARCHARdata), OR switch to using
NVARCHARcolumns / data (in which case you need to prefix your string literals, in the app, with an upper-case
N, and/or specify the
VARCHARdatatype for parameters, depending on how the data is being sent). Both of those options are Unicode, which can handle all characters. If switching to
NVARCHARdata and columns (which could be a much larger change), the database connectivity software should assume the encoding to be UTF-16 Little Endian without you needing to specify that.
Update 2 of the Question indicates two possible suspects:
I read values from a JSON file which I've passed into a Newtonsoft.Json.Linq.JArray
then pass it to a function:
cmd.Parameters.Add(New SqlParameter("@streetname", street))
Let's start with the second item since it's an issue even if it's not the source of this particular problem.
You are using the following constructor for
SqlParameter: https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlparameter.-ctor#system-data-sqlclient-sqlparameter-ctor(system-string-system-object) (which I'm guessing is the same as the infamous https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlparametercollection.addwithvalue method). The problem here is that the datatype is inferred from the value, and sometimes mistakes can be made when trying to guess the datatype. It is always better to specify the datatype explicitly. And you know the datatype. So, use the following instead (be sure to adjust the size of the parameter to match the actual column if not 500):
Dim param as New SqlParameter("@streetname", SqlDbType.NVarChar, 500) param.Value = street cmd.Parameters.Add(param)
You should make that change first to see if it fixes the problem (and then fix all code that creates parameters without explicitly setting the datatype). While it will be an improvement in either case, I suspect that it won't actually solve the problem. I think that it would have to infer the type being XML in order to convert to UTF-8 encoding, and I don't believe that would happen with just a name and no actual XML in the value.
The other item of interest is reading the data from a file via Json.NET. I wouldn't be surprised if you need to explicitly set the output encoding else it defaults to UTF-8. It appears that the file is being read correctly since ß does indeed encode into UTF-8 as the bytes C3 9F.
So, if fixing how the
SqlParameteris created doesn't fix the issue, then you need to look into setting the encoding of the JSON.
I highly recommend that you save the bytes of the value coming into your
InsertStreetIdmethod to a file for debugging, using something like:
BitConverter.ToString( BitConverter.GetBytes( street ) )
I suspect you will see a value such as:
4700 6C00 6500 6900 6D00 7300 7400 7200 6100 C300 7801 6500
which is the UTF-16 LE representation of
N'GleimstraÃŸe'. If this is the case, then the issue is happening prior to the
InsertStreetIdmethod being called, which means this has nothing to do with SQL Server.