Crash of MSSQL CHECKDB with EXCEPTION_ACCESS_VIOLATION



  • I have a db with some consistency errors. Below the messages I get when issuing a CHECKDB on it with repair_rebuild option.

    CHECKDB found 0 allocation errors and 16 consistency errors in database 'NAMEREPLACED'.                              │2022-03-21 15:47:12.76 spid51      Stack Signature for the dump is 0x00000000951
    repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (NAMEREPLACED, repair_rebuild).
    

    And the log content...
    root@mssql:/var/opt/mssql/log# cat SQLDump0001.
    cat: SQLDump0001.: No such file or directory
    root@mssql:/var/opt/mssql/log# cat SQLDump0001.txt
    Current time is 15:47:12 03/21/22.

       BugCheck Dump                                                                                             
    

    =====================================================================

    This file is generated by Microsoft SQL Server
    version 15.0.4198.2
    upon detection of fatal unexpected error. Please return this file,
    the query or program that produced the bugcheck, the database and
    the error log, and any other pertinent information with a Service Request.

    4 X64 level 8664, 2 Mhz processor (s).
    Linux Distribution Ubuntu 20.04.4 LTS Release focal

    Memory
    MemoryLoad = 18%
    Total Physical = 8000 MB
    Available Physical = 6536 MB
    Total Page File = 8000 MB
    Available Page File = 6536 MB
    Total Virtual = 67108863 MB
    Available Virtual = 67104767 MB
    ***Stack Dump being sent to /var/opt/mssql/log/SQLDump0001.txt
    SqlDumpExceptionHandler: Process 51 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is
    terminating this process.


    • BEGIN STACK DUMP:
    • 03/21/22 15:47:12 spid 51
    • Exception Address = 0000000B83707C51 Module(sqlmin+0000000001637C51)
    • Exception Code = c0000005 EXCEPTION_ACCESS_VIOLATION
    • Access Violation occurred reading address 0000000000000000
    • Input Buffer 94 bytes -
    •          DBCC CHECKDB('NAMEREPLACED',repair_rebuild)  
      

    It is followed by a stack trace. This behavior was originally observed on SQL Server 2019 (with hotfixes) on a Windows Server 2019. I gave it a try on the Linux version of SQL Server 2019 and I got the same errors and stack trace. I suspect a bug. Windows version also crashing is :

    Windows version -> Microsoft SQL Server 2019 (RTM-CU15) (KB5008996) - 15.0.4198.2 (X64)   Jan 12 2022 22:30:08   Copyright (C) 2019 Microsoft Corporation  Standard Edition (64-bit) on Windows Server 2019 Standard 10.0  (Build 17763: ) (Hypervisor) 
    

    What do you think and how could I report this to MS ?

    I don't know how to report a bug to Microsoft. I don't find the procedure... Either from SSMS or from the website. Is there a public email address, tool or dedicated web platform for that ?

    Errors visible in the CHECKDB output are like that (but unfortunately in French) :

    Msg 8952, Niveau 16, État 1, Ligne 1
    Erreur de table : table 'sys.syscolpars' (ID 41). La ligne d'index de l'index 'nc' (ID 2) ne correspond à aucune ligne de données. Clés supplémentaires ou non valides pour :
    Msg 8956, Niveau 16, État 1, Ligne 1
    Ligne d'index (1:96551:30) avec des valeurs (id = 1059235274 and name = 'binary_message_body' and number = 0 and colid = 15) pointant vers la ligne de données identifiée par (id = 1059235274 and number = 0 and colid = 15).
    Msg 8952, Niveau 16, État 1, Ligne 1
    Erreur de table : table 'sys.syscolpars' (ID 41). La ligne d'index de l'index 'nc' (ID 2) ne correspond à aucune ligne de données. Clés supplémentaires ou non valides pour :
    Msg 8956, Niveau 16, État 1, Ligne 1
    Ligne d'index (1:96551:32) avec des valeurs (id = 1059235274 and name = 'conversation_handle' and number = 0 and colid = 5) pointant vers la ligne de données identifiée par (id = 1059235274 and number = 0 and colid = 5).
    

    Thank you for pointing out the function that raises the exception, that I would assume, should be caught in some way normally.

    The database has not been cloned, this happens on a database that is currently in use... I turned out that our Windows Server crashed during an automatic upgrade process and that it we had to hard reset the machine.

    Indeed, there is a consistency problem with it. What would be the best way to manually correct this ?

    I have access to healthy backups from 20 days ago, so I'm naively thinking about a kind of (dumb or smart) diff but I have serious doubts about the feasibility.

    We are definitely not experts in MS SQL where I work.



  • Exception Address = 0000000B83707C51 Module(sqlmin+0000000001637C51)

    This translates, in CU15, to sqlmin!CheckMetadataFactory::ValidateSingleColumnAttribute which would push me in the direction of something not being correct or proper with the underlying system table information for some specific columns. It makes me wonder if this database was cloned or otherwise have some system table corruption.

    Translated the errors from French to English:

    Msg 8952, Level 16, State 1, Line 1
    Table error: table 'sys.syscolpars' (ID 41). Index row of index 'nc' (ID 2) does not match any row of data. Additional or invalid keys for:
    Msg 8956, Level 16, State 1, Line 1
    Index row (1:96551:30) with values (id = 1059235274 and name = 'binary_message_body' and number = 0 and colid = 15) pointing to the row of data identified by (id = 1059235274 and number = 0 and colon = 15).
    Msg 8952, Level 16, State 1, Line 1
    Table error: table 'sys.syscolpars' (ID 41). Index row of index 'nc' (ID 2) does not match any row of data. Additional or invalid keys for:
    Msg 8956, Level 16, State 1, Line 1
    Index row (1:96551:32) with values (id=1059235274 and name='conversation_handle' and number=0 and colid=5) pointing to data row identified by (id=1059235274 and number=0 and package = 5). 
    

    This seems to correlate nicely with the AV as there seems to be some underlying metadata issues with service broker. To reiterate, I'm not sure if this was a cloned database or not - if not, there's most likely some system metadata problems. I don't believe Check* commands should AV, though, but these seem like two completely different issues:

    1. System metadata corruption/issue
    2. Check* commands not catching the AV (whether it should or not is up for debate)

    From your updates:

    turned out that our Windows Server crashed during an automatic upgrade process and that it we had to hard reset the machine.

    Indeed, there is a consistency problem with it. What would be the best way to manually correct this ?

    The best way would be to restore a copy of the database and attempt to restore the logs to get it up to date. Other options include creating a new database and copying the objects and data over.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2