How make Db2 database transaction log full fully robust?


  • QA Engineer

    Db2 v11.5.7.0 on Linux and for some reason few times per year database hits the transaction log full error.

    My current solution was to increase logsecond, logprimary or logfilsiz and now I hit to foolish level when log file consumed is just 0.1% for whole day to day. But for some reason database few times per year still hits the transaction log full error.

    I investigated this problem in detail and bellow are step by step details how to reproduce the problem. I created tree cases. First and second case works as expected, but case 3 still runs into transaction log full problem.

    db2 "create database test1"
    mkdir /home/db2inst1/db2archlog
    db2 "update db cfg for test1 using logarchmeth1 DISK:/home/db2inst1/db2archlog"
    db2 "backup database test1 to /dev/null"
    db2 "update db cfg for test1 using logfilsiz 100"
    db2 "update db cfg for test1 using logprimary 3"
    db2 "update db cfg for test1 using logsecond 2"
    db2 "update db cfg for test1 using num_log_span 4"
    db2 "update db cfg for test1 using max_log 95"
    

    Above logfilsiz, logprimary and logsecond are intentionally set to very small values to quickly demonstrate transaction log full problems.

    CASE 1

    db2 connect to test1
    db2 "create table admin.tab1 (col1 int)"
    db2 +c "INSERT INTO admin.tab1
    WITH temp1 (s1) AS
    (
    VALUES (0)
    UNION ALL
    SELECT s1 + 1
    FROM temp1
    WHERE s1 < 1000000
    )
    SELECT s1 FROM temp1"
    

    After few seconds error is returned:

    SQL1224N  The database manager is not able to accept new requests, has
    terminated all requests in progress, or has terminated the specified request
    because of an error or a forced interrupt.  SQLSTATE=55032
    

    Analyzing db2diag log:

    ADM1542W  Application "db2bp" with application handle
    "*LOCAL.db2inst1.220512112327" and application id "DB2INST1"
    executing under authentication id "95" will be forced off of the
    database for violating database configuration parameter MAX_LOG
    (current value ""). The unit of work will be rolled back.
    

    Transaction was trying to consume whole transaction log so MAX_LOG hit 95% and application was forced of the database. Works as expected. Transaction log full prevented successfully.

    CASE 2

    In first Db2 session:

    db2 connect to test1
    db2 +c "insert into admin.tab1 values (1)"
    

    In second Db2 session:

    db2 connect to test1
    db2 "BEGIN
    DECLARE MESSAGE VARCHAR(100);
     DECLARE STATUS INTEGER;
     DECLARE I INTEGER;
     SET i=1;
     WHILE i < 50000 DO
     INSERT INTO ADMIN.TAB1 VALUES (i);
     COMMIT;
     SET i = i + 1;
     END WHILE;
    END"
    

    Above SQL inserts one by one record in loop and commits each of the record. This can take like minute or two. Inserts complete successfully.

    Now back to first Db2 session and execute:

    db2 commit
    

    and error is displayed:

    SQL1224N  The database manager is not able to accept new requests, has
    terminated all requests in progress, or has terminated the specified request
    because of an error or a forced interrupt.  SQLSTATE=55032
    

    Analyzing db2diag log:

    ADM1541W Application "db2bp" with application handle "0-216" and
    application id "*LOCAL.db2inst1.220512112650" executing under
    authentication id "DB2INST1" has been forced off of the database for
    violating database configuration parameter NUM_LOG_SPAN (current
    value "4"). The unit of work will be rolled back.
    

    Because application inserting row by row and committing each row in one UOW exceeded the 4 logs files limit set as NUM_LOG_SPAN, database action was triggered and forced off the oldest uncommitted transaction, which is the one in Db2 first session. Transaction log full prevented successfully.

    CASE 3

    In Db2 first session:

    db2 connect to test1
    db2 +c "INSERT INTO admin.tab1
    WITH temp1 (s1) AS
    (
    VALUES (0)
    UNION ALL
    SELECT s1 + 1
    FROM temp1
    WHERE s1 < 5000
    )
    SELECT s1 FROM temp1"
    

    Insert executes successfully.

    Open second Db2 session and execute the same insert. Insert executes successfully. Open third Db2 session and execute the same insert and it returns:

    SQL0964C  The transaction log for the database is full.  SQLSTATE=57011
    

    Now because of many small uncommitted transactions that none of it hits the MAX_LOG or NUM_LOG_SPAN limit, transaction log can still get saturated in very short period of time. Like in above sample under one minute.

    Analyzing db2diag log:

    MESSAGE : ZRC=0x85100009=-2062548983=SQLP_NOSPACE "Log File has reached its    
    saturation point" DIA8309C Log file was full.
    

    I know I can increase LOGFILSIZ, LOGPRIMARY and LOGSECOND and for one of the database I did this already pretty aggressively in the way that day to day applications consumes maximum of 0.1% of transaction logs and few times a year there is still some combination of application execution that gets into transaction log full situation. I suspect there is some run away application opening several hundreds of connections and does not commit them. This event is so rare it is very difficult to capture. We tried to do tons of application tests in test environment and problem never appeared.

    I know the only final solution will be to track down run away application and fix the application bug. But there can still appear some new application that has the same problems. From database perspective I would like to implement some mechanism on database to prevent transaction log full situation independently from application layer.

    I know I can write script that runs in a loop and monitors transaction log used space like using MON_GET_TRANSACTION_LOG table and force application off the database when transaction log is full or near full. But is there any simpler way without scripts? Like some Db2 parameter or combination of parameters to prevent "case 3" transaction log saturation?

    Regards



  • You could enable what's called https://www.ibm.com/docs/en/db2/11.5?topic=databases-space-requirements-log-files by setting logsecond to -1, but the term is a bit misleading, and it will likely make your system less robust, not more.

    Since all hardware resources are finite, you're simply shifting your pain point to a different place -- you now risk filling your archive log location and then active log location, which will eventually cause an abend of the entire database instance, instead of simply terminating the runaway transaction. If this happens you may have hard time restarting the instance -- you cannot do that until you free up some space, but the space is occupied by the active log files that are needed for crash recovery, so you can't delete them, and if your primary and archive logs are on their own separate file systems, which is often the case, there's nothing else to delete from those file systems to free the space.

    Another downside of "infinite logging" is in potentially much slower crash recovery. With "infinite logging" the database manager begins archiving active log files once it runs out of the primary log space, which means that during recovery these log files must be retrieved from the archive location (which is typically slower than the active log location).

    In other words, the only good way to solve the problem is to find and fix the application that requires too much log space.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2