More and more customers are now asking for some way to check the health of the DB2 pureScale system. Let’s just focus on few early diagnostics before we jump to the deep exploration at the SQL statement level. This is just akin going to primary health care physician to catch symptoms early.

The DB2 pureScale heart is Cluster Caching Facility (CF) and if this slows down, it will have effect on everything.

If we just focus early on the symptoms exhibited by CF, it is equivalent to finding the cure for the disease.

In my experience working with hundreds customers all over North America, almost 99.9% of the folks jump right to the SQL tuning. Their is nothing wrong in it but I consider it to be treating the symptom and than the disease. A particular SQL could as well be the real issue and I do not deny that but it is last on my to-do list to tune a system. Let’s get on track to check the health of pureScale system.

I am just copying few SQLs here that I have used in many situations to derive some conclusions on the health of the CF.

CF Health Check

If CF becomes the bottleneck, everything slows down. This is on my priority list to check its health. As per Steve Rees, a performance expert Guru, I have settled on the following 3 SQLs to examine CF health.

SLS and WARM Rate

SELECT CF_CMD_NAME, 
       DEC(CAST(SUM(TOTAL_CF_WAIT_TIME_MICRO) AS FLOAT) / 
       CAST(SUM(TOTAL_CF_REQUESTS) AS FLOAT),5,0) AS CF_CMD_RATE 
FROM   TABLE (SYSPROC.MON_GET_CF_WAIT_TIME(-2)) 
WHERE  ID = (select ID from sysibmadm.db2_cf where STATE = 'PRIMARY') 
AND    CF_CMD_NAME IN ('SetLockState', 'WriteAndRegisterMultiple') 
GROUP BY CF_CMD_NAME 
HAVING SUM(TOTAL_CF_WAIT_TIME_MICRO) > 0 
AND    SUM(TOTAL_CF_REQUESTS) > 0
;

In above SQL, we are just taking 2 CF commands SetLockState and WriteAndRegisterMultiple and finding out the average rate in micro seconds. The SetLockState (SLS) is a small packet and it should bea quick turnaround and gives a baseline for all other messages. The other message WriteAndRegisterMultiple (WARM) is a larger packet and will include XI time. High WARM times can indicate both heavier network activity and more page invalidation, and measuring WARM times periodically can reveal when statement times are being adversely affected. (Courtesy: Toby Haynes of IBM Lab in Toronto).

Overall CF Wait Rate

The following SQL is the average rate over all CF commands. This is another overall indicator but it has to be compared against a baseline that you collected in a healthy system.

-- AVG_CF_WT_MICRO is an indication if CF is the bottleneck
-- This is an average over all CF calls
-- Best way to judge good or bad number - Look for a change
-- from what is normal for your system

SELECT INT(SUM(RECLAIM_WAIT_TIME)) RECLAIMTIME_MILLI,
 INT(SUM(CF_WAITS)) AS NUM_CF_WAITS,
 INT(SUM(CF_WAIT_TIME)) CF_WAIT_MILLI, 
 CAST ((CASE WHEN SUM(CF_WAITS) = 0 THEN NULL 
 ELSE (INT(1000 * DEC(SUM(CAST(CF_WAIT_TIME AS FLOAT))/
 SUM(CAST(CF_WAITS AS FLOAT)),5,4)))
 END) AS DECIMAL(10,4)) AVG_CF_WT_MICRO 
FROM TABLE(SYSPROC.MON_GET_WORKLOAD('',-2)) AS t
;

However, the output from above two SQLs should be seen along with the page negotiation rate. The page negotiation refers to a page getting tossed around due to request made by another member. If the page negotiation rate is much higher, it will lead to performance degradation as more and more time is spent in shuffling / tossing around the page between members through CF. Use the following SQL to determine this rate.

Page Negotiation Rate

-- For example: If Member 'A' acquires a page P and modifies a row on it.
-- 'A' holds an exclusive page lock on page until 'A' commits
-- Member 'B' wants to modify a different row on the same page.
-- 'B' does not have to wait until 'A' commits
-- CF will negotiate the page back from 'A' on 'B's behalf.
-- Provides better concurrency - Good but excessive can cause
-- contention, low CPU usage, reduced throughput.

-- Recoomendations to reduce excessive page reclaim
-- What is excessive is debatable and sort it in desc order 
-- for all tables and pick top 5 tables
-- If excessive page reclaim then do these
-- Consider smaller page size
-- For small HOT tables with frequent updates, increase PCTFREE
-- PCTFREE will spread rows over more pages
-- Side effect - More space consumption (but this is small table anyway)
 
SELECT MEMBER, 
       SUBSTR(TABSCHEMA,1,12) AS SCHEMA, 
       SUBSTR(TABNAME,1,16) AS NAME,
       SUBSTR(OBJTYPE,1,16) AS TYPE,
      (PAGE_RECLAIMS_X + PAGE_RECLAIMS_S) AS PAGE_RECLAIMS, 
       RECLAIM_WAIT_TIME
FROM TABLE( MON_GET_PAGE_ACCESS_INFO(NULL,NULL, NULL) ) AS WAITMETRICS
ORDER BY SCHEMA, NAME
;

The output from the above 3 SQLs can be collected at 15 minute interval plot the time series graph and observe the trend. The increase in SLS and WARM rate by a factor of 2 to 3 with reference to a baseline will indicate that CF network is becoming the bottleneck.

Adding more adapters at the CF will help resolve this bottleneck.

Group Buffer Pool Full Condition

Use the following SQL to determine group buffer pool full condition. This condition will cause a stall condition in which synchronous I/O will run to flush the pages through members to make room in the Group Buffer Pool. This will lead to a burst of the I/O activity. Increasing the CF memory may delay the burst activity.

-- GBP Full Condition
-- Good value: < 5 per 10000 transactions
-- If higher, GBP is small
-- The castout engines might not be keeping up
-- SOFTMAX is set too high

WITH GBPC AS ( 
 SELECT 10000.0 * SUM(GBP.NUM_GBP_FULL) / 
        SUM(COMMIT_SQL_STMTS) AS GBP_FULL_CONDITION 
 FROM TABLE(MON_GET_GROUP_BUFFERPOOL(-2)) as GBP, SYSIBMADM.SNAPDB
) SELECT CASE WHEN GBP_FULL_CONDITION < 5.0 THEN 'GOOD VALUE' 
 ELSE 'GBP FULL CONDITION' END AS RESULT,
 CASE WHEN GBP_FULL_CONDITION < 5.0 THEN 'NO GBP FULL CONDITION' 
 ELSE 'INCREASE CF_GBP_SZ OR DECREASE SOFTMAX OR INCREASE NUM_IOSERVERS' 
 END AS RECOMMENDATION
 FROM GBPC 
;

CF Swap Usage

We may not be able to avoid few hundred pages of swap but excessive swapping will adversely affect the performance of the system as a whole. Keep an eye of the CF swap usage. If you are not dipping into swap space, you have kept one problem at bay.

WITH TYPES AS (SELECT NAME FROM SYSIBMADM.ENV_CF_SYS_RESOURCES 
               GROUP BY NAME)
SELECT A.ID, 
 SUBSTR(MIN(DECODE(T.NAME, 'HOST_NAME', A.VALUE)),1,36) HOST_NAME,
 INT(MIN(DECODE(T.NAME, 'MEMORY_TOTAL', A.VALUE)) -
 MIN(DECODE(T.NAME, 'MEMORY_FREE', A.VALUE))) MEMORY_IN_USE ,
 INT(MIN(DECODE(T.NAME, 'MEMORY_SWAP_TOTAL', A.VALUE)) -
 MIN(DECODE(T.NAME, 'MEMORY_SWAP_FREE', A.VALUE))) SWAP_IN_USE
FROM SYSIBMADM.ENV_CF_SYS_RESOURCES A, TYPES T
WHERE A.NAME = T.NAME
GROUP BY A.ID
;

Member Swap Usage

WITH TYPES AS (SELECT NAME FROM SYSIBMADM.ENV_SYS_RESOURCES 
 GROUP BY NAME)
SELECT SUBSTR(MIN(DECODE(T.NAME, 'HOST_NAME', A.VALUE)),1,36) HOST_NAME,
 INT(MIN(DECODE(T.NAME, 'MEMORY_TOTAL', A.VALUE)) -
 MIN(DECODE(T.NAME, 'MEMORY_FREE', A.VALUE))) MEMORY_IN_USE ,
 INT(MIN(DECODE(T.NAME, 'MEMORY_SWAP_TOTAL', A.VALUE)) -
 MIN(DECODE(T.NAME, 'MEMORY_SWAP_FREE', A.VALUE))) SWAP_IN_USE
FROM SYSIBMADM.ENV_SYS_RESOURCES A, TYPES T
WHERE A.NAME = T.NAME
;

SAN System Performance

The SAN system performance measurement is not accurately possible using DB2 but we can measure the rate at which pages are flushed to the disk from the buffer pool and from the DB2 logs. This is a crude indication of the performance of the I/O subsystem.

The castout is the term used on z/OS DB2 sysplex, which is similar to page cleaning in DB2 LUW. Calculate number of writes per transaction and calculate time per write. If this time is increasing over 10 ms, it is an indication that I/O subsystem is not able to keep-up.

-- Calculate number of writes / transactions (CASTOUTS_PER_TRANSACTION)
-- Calculate time per write (CASTOUT_TIME_MILLI_PER_TRANSACTION)
-- 
-- Bursty write activity may be a sign of SOFTMAX being high
-- Long Write Time (CASTOUT_TIME_MILLI_PER_TRANSACTION) is an
-- indication that I/O subsystem may not be able to keep up
-- 
-- Castout activity is influenced by
-- SOFTMAX - Lower value means faster group crash receovery but more
-- aggressive cleaning
-- Consider setting SOFTMAX higher than equivalent EE system

SELECT CASE WHEN SUM(W.TOTAL_APP_COMMITS) < 100 
       THEN NULL ELSE
 CAST( FLOAT(SUM(B.POOL_DATA_WRITES+B.POOL_INDEX_WRITES))
 / SUM(W.TOTAL_APP_COMMITS) AS DECIMAL(6,1)) END 
 AS "CASTOUTS_PER_TRANSACTION",
 CASE WHEN SUM(B.POOL_DATA_WRITES+B.POOL_INDEX_WRITES) < 1000 
    THEN NULL ELSE
 CAST( FLOAT(SUM(B.POOL_WRITE_TIME))
 / SUM(B.POOL_DATA_WRITES+B.POOL_INDEX_WRITES) AS DECIMAL(5,1)) END 
 AS "CASTOUT_TIME_MILLI_PER_TRANSACTION"
FROM TABLE(MON_GET_WORKLOAD(NULL,NULL)) AS W, 
TABLE(MON_GET_BUFFERPOOL(NULL,NULL)) AS B
;

If the value of CASTOUTS_PER_TRANSACTION is less than 10 and CASTOUT_TIME_MILLI_PER_TRANSACTION is less than 1 ms, the I/O subsystem is keeping well and indicates that it is a modern flash based SAN storage.

Size of Database

select member, 
 decimal(sum(double(tbsp_used_pages) * tbsp_page_size ) / 
 1024 / 1024, 10, 2 ) as db_mb_used 
from table( mon_get_tablespace(null, -2)) 
group by member
;

Detect Log Full Condition

SELECT MEMBER,
       TOTAL_LOG_AVAILABLE / 1048576 AS LOG_AVAILABLE_MB,
       TOTAL_LOG_USED / 1048576 AS LOG_USED_MB,
       CAST (((CASE WHEN (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) = 0
            OR (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) IS NULL
            OR TOTAL_LOG_AVAILABLE = -1 THEN NULL
            ELSE ((CAST ((TOTAL_LOG_USED) AS DOUBLE) / CAST (
               (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) AS DOUBLE))) * 100
       END)) AS DECIMAL (5,2)) AS USED_PCT,
       APPLID_HOLDING_OLDEST_XACT
FROM TABLE (MON_GET_TRANSACTION_LOG(-2))
ORDER BY USED_PCT DESC;

From above, find out the % of log space used and most importantly the application id that is holding a log file active, which may lead to a log full condition.

Detect Log Hog / Drag Application

select a.application_handle,
       a.workload_occurrence_state as status,
       substr(a.session_auth_id,1, 10) as authid,
       substr(c.application_name, 1, 10) as applname,
       int(a.uow_log_space_used/1024/1024) as logusedM,
       timestampdiff(4, char(current timestamp
          - b.agent_state_last_update_time)) as idleformin
from table(mon_get_unit_of_work(NULL,-2)) as a,
     table(mon_get_agent(NULL,NULL,NULL,-2)) as b,
     table(mon_get_connection(NULL, -2)) as c
where a.application_handle = b.application_handle
  and a.coord_member = b.member
  and a.coord_member = a.member
  and b.agent_type = 'COORDINATOR'
  and a.uow_stop_time is null
  and a.application_handle = c.application_handle
  and a.coord_member = c.member
  and b.event_type = 'WAIT'
  and b.event_object = 'REQUEST'
  and b.event_state = 'IDLE';

Use above SQL to find out the application that has been idle or waiting for a long time and this can hold a log file active.

DB2 pureScale Health Check