Storing special characters (German, French, Spanish) in SQL Server result in weird strings

  • I'm trying to store the German ß in SQL Server, but it results in ß. For example "Gleimstraße" is stored as "Gleimstraße"

    But this also happens for other characters:

    • "König-Karl-Stra­ße" stored as "König-Karl-Straße"
    • "Quai André-Citroën" stored as "Quai André Citroën"
    • "Carrer dels Adreçadors stored as "Carrer dels Adreçadors"

    I looked at the database collation and it shows SQL_Latin1_General_CP1_CI_AS. I then Googled and found .

    But I have no idea if my existing collation is causing the issue, or what I need to do to be able to store the special characters like ö, ß, é, ë, ç, ñ, í.

    UPDATE 1

    Indeed, I'm storing the strings in an nvarchar column. I send the data to the database via my .NET application. This is what I have in my web.config as the connection string:

    UPDATE 2

    I read values from a JSON file which I've passed into a Newtonsoft.Json.Linq.JArray and then pass it to a function, like below:

        InsertStreetId(item.SelectToken("location.street").Value(Of String))

    Public Shared Function InsertStreetId(ByVal street As String) As Integer
    Dim streetId As Integer

        Dim myConnection As SqlConnection = GetConnection()
        Dim cmd As New SqlCommand("INSERT INTO geo_streets(streetname) VALUES (@streetname)" +
                             ";SELECT CAST(scope_identity() as int);", myConnection)
        cmd.Parameters.Add(New SqlParameter("@streetname", street))
            streetId = CInt(cmd.ExecuteScalar)
        Catch ex As Exception
        End Try
        Return streetId
    End Function

  • Somehow the data is being sent over as UTF-8 bytes. Because your database collation is SQL_Latin1_General_CP1_CI_AS, that uses code page 1252 (Latin1) for VARCHAR data. The ß character is encoded as bytes C3 9F. In code page 1252, those two bytes equate to the following characters: Ã and Ÿ.

    You can see this for yourself by executing the following in SSMS or Visual Studio Code, etc:

    SELECT CHAR(0xC3) + CHAR(0x9F);

    One option to solve this would be to use a UTF-8 collation (i.e. one ending in _UTF8) as the database default collation, which you can do since you are using SQL Server 2019.

    If you can't, or simply do not want to, change the database default collation to a _UTF8 collation, then you will need to find a way to change the encoding / culture / locale / collation of the connection to match your database code page (i.e. Windows 1252). Fortunately code page 1252 does store those characters (i.e. ö, ß, é, ë, ç, ñ, í ), but if you need to store a greater range of characters, you might need to either switch to a UTF-8 collation (if sticking with VARCHAR data), OR switch to using NVARCHAR columns / data (in which case you need to prefix your string literals, in the app, with an upper-case N, and/or specify the NVARCHAR instead of VARCHAR datatype for parameters, depending on how the data is being sent). Both of those options are Unicode, which can handle all characters. If switching to NVARCHAR data and columns (which could be a much larger change), the database connectivity software should assume the encoding to be UTF-16 Little Endian without you needing to specify that.

    Update 2 of the Question indicates two possible suspects:

    1. I read values from a JSON file which I've passed into a Newtonsoft.Json.Linq.JArray


    2. then pass it to a function:

      cmd.Parameters.Add(New SqlParameter("@streetname", street))

    Let's start with the second item since it's an issue even if it's not the source of this particular problem.

    You are using the following constructor for SqlParameter: (which I'm guessing is the same as the infamous method). The problem here is that the datatype is inferred from the value, and sometimes mistakes can be made when trying to guess the datatype. It is always better to specify the datatype explicitly. And you know the datatype. So, use the following instead (be sure to adjust the size of the parameter to match the actual column if not 500):

    Dim param as New SqlParameter("@streetname", SqlDbType.NVarChar, 500)
    param.Value = street

    You should make that change first to see if it fixes the problem (and then fix all code that creates parameters without explicitly setting the datatype). While it will be an improvement in either case, I suspect that it won't actually solve the problem. I think that it would have to infer the type being XML in order to convert to UTF-8 encoding, and I don't believe that would happen with just a name and no actual XML in the value.

    The other item of interest is reading the data from a file via Json.NET. I wouldn't be surprised if you need to explicitly set the output encoding else it defaults to UTF-8. It appears that the file is being read correctly since ß does indeed encode into UTF-8 as the bytes C3 9F.

    So, if fixing how the SqlParameter is created doesn't fix the issue, then you need to look into setting the encoding of the JSON.

    I highly recommend that you save the bytes of the value coming into your InsertStreetId method to a file for debugging, using something like:

    BitConverter.ToString( BitConverter.GetBytes( street ) )

    I suspect you will see a value such as:

    4700 6C00 6500 6900 6D00 7300 7400 7200 6100 C300 7801 6500

    which is the UTF-16 LE representation of N'Gleimstraße'. If this is the case, then the issue is happening prior to the InsertStreetId method being called, which means this has nothing to do with SQL Server.

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2