How to detect log full condition

I have seen working with hundreds of customers a chronic LOG FULL condition and the shortcut people use is to increase the LOGPRIMARY and LOGSECOND. This solution works well for a single unit of work that can not fit into the total log space available.

So, first let’s calculate the total log space available:

LOG SIZE = (LOGPRIMARY + LOGSECOND) * 4 * LOGFILSIZ

For example, if my LOGPRIMARY=2, LOGSECOND=1 and LOGFILSIZ=4096, then my LOG SIZE = (2+1) * 4 * 4096 = 49, 152 KB = 48 MB.

If you look at the physical log files:

$ ls -l *.LOG
-rw------- 1 db2psc db2iadm 16785408 Mar 16 14:38 S0000000.LOG
-rw------- 1 db2psc db2iadm 16785408 Mar 16 14:21 S0000001.LOG
-rw------- 1 db2psc db2iadm 16785408 Mar 16 14:23 S0000002.LOG

If you look at the size of each log file, it is more by 8 KB for each file than what our actual calculations show. This is the overhead of the log space.

The LOG FULL conditions can appear most for the following two conditions:

The size of a single unit of work is greater than the actual size of the log files. In this case, if we are doing INSERTS and if each unit of work exceeds 48 MB, we will run out of the log space. This issue can be happily resolved by increasing all or a combination of database configurations parameters such as: LOGPRIMARY, LOGSECOND and LOGFILSIZ. This appears to solve the LOG FULL condition.
The second most common cause of the LOG FULL is not from the single unit of work exceeding all available log space. But, it is from the fact that a single transaction has started the unit of work and it has not yet finished with a result that the active log file is still open. You continue to do the work and when it is turn of this log file to be archived (archive logging) or use it again (circular logging), you will get LOG FULL condition. If you are using LOAD utility, the LOAD record gets logged and that log file will remain open for the duration of the LOAD command. During this condition, if a need arises to either archive the log or use the same file again, you will get LOG FULL condition.

You can use the following SQL to determine the connection (doing the work on behalf of the application) that is holding the oldest transaction that has not yet been committed. For this type of LOG FULL condition, it is best to kill the connection and let the work keep going.

SELECT MEMBER,
       TOTAL_LOG_AVAILABLE / 1048576 AS LOG_AVAILABLE_MB,
       TOTAL_LOG_USED / 1048576 AS LOG_USED_MB,
       CAST (((CASE WHEN (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) = 0
            OR (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) IS NULL
            OR TOTAL_LOG_AVAILABLE = -1 THEN NULL
            ELSE ((CAST ((TOTAL_LOG_USED) AS DOUBLE) / CAST (
               (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) AS DOUBLE))) * 100 
       END)) AS DECIMAL (5,2)) AS USED_PCT,
       APPLID_HOLDING_OLDEST_XACT
FROM TABLE (MON_GET_TRANSACTION_LOG(-2))
ORDER BY USED_PCT DESC;

You may see a sample output as shown below:

MEMBER LOG_AVAILABLE_MB     LOG_USED_MB          USED_PCT APPLID_HOLDING_OLDEST_XACT
------ -------------------- -------------------- -------- --------------------------
     0                    0                    46   98.37                        185

If you notice that the LOG used is more than 98% and the connection holding the log file hostage is 185. Find out who that person is and get it fixed to release the active log file to remove the log full condition.

If you want to go further and see how long this transaction has been on hold, run the following SQL.

select a.application_handle, 
       a.workload_occurrence_state as status, 
       substr(a.session_auth_id,1, 10) as authid,
       substr(c.application_name, 1, 10) as applname, 
       int(a.uow_log_space_used/1024/1024) as logusedM, 
       timestampdiff(4, char(current timestamp 
          - b.agent_state_last_update_time)) as idleformin 
from table(mon_get_unit_of_work(NULL,-2)) as a, 
     table(mon_get_agent(NULL,NULL,NULL,-2)) as b, 
     table(mon_get_connection(NULL, -2)) as c 
where a.application_handle = b.application_handle 
  and a.coord_member = b.member
  and a.coord_member = a.member 
  and b.agent_type = 'COORDINATOR'
  and a.uow_stop_time is null
  and a.application_handle = c.application_handle 
  and a.coord_member = c.member 
  and b.event_type = 'WAIT' 
  and b.event_object = 'REQUEST' 
  and b.event_state = 'IDLE';

The sample output is shown as below:

APPLICATION_HANDLE   STATUS                           AUTHID     APPLNAME      LOGUSEDM  IDLEFORMIN 
-------------------- -------------------------------- ---------- ---------- ----------- -----------
                 185 UOWWAIT                          DB2PSC     db2bp                0          72

If you notice that connection 185 has not used much log space (less than 1 MB) and it has been waiting since last 72 minutes. This is drag on the system and this has the potential of bringing the system down to its knees.

But, please note the SQL LOG FULL condition for condition number 2 as explained above depends upon the log file that is active and that can happen for variety of reasons. One of the other common cuase that I have seen is the bad programming in which ROLLBACK is not specified on an exception and commit never happened for that unit of the work. This situation will lead to LOG FULL condition if that transaction is not resolved.