More and more customers are now asking for some way to check the health of the DB2 pureScale system. Let’s just focus on few early diagnostics before we jump to the deep exploration at the SQL statement level. This is just akin going to primary health care physician to catch symptoms early.
The DB2 pureScale heart is Cluster Caching Facility (CF) and if this slows down, it will have effect on everything.
If we just focus early on the symptoms exhibited by CF, it is equivalent to finding the cure for the disease.
In my experience working with hundreds customers all over North America, almost 99.9% of the folks jump right to the SQL tuning. Their is nothing wrong in it but I consider it to be treating the symptom and than the disease. A particular SQL could as well be the real issue and I do not deny that but it is last on my to-do list to tune a system. Let’s get on track to check the health of pureScale system.
I am just copying few SQLs here that I have used in many situations to derive some conclusions on the health of the CF.
CF Health Check
If CF becomes the bottleneck, everything slows down. This is on my priority list to check its health. As per Steve Rees, a performance expert Guru, I have settled on the following 3 SQLs to examine CF health.
SLS and WARM Rate
SELECT CF_CMD_NAME, DEC(CAST(SUM(TOTAL_CF_WAIT_TIME_MICRO) AS FLOAT) / CAST(SUM(TOTAL_CF_REQUESTS) AS FLOAT),5,0) AS CF_CMD_RATE FROM TABLE (SYSPROC.MON_GET_CF_WAIT_TIME(-2)) WHERE ID = (select ID from sysibmadm.db2_cf where STATE = 'PRIMARY') AND CF_CMD_NAME IN ('SetLockState', 'WriteAndRegisterMultiple') GROUP BY CF_CMD_NAME HAVING SUM(TOTAL_CF_WAIT_TIME_MICRO) > 0 AND SUM(TOTAL_CF_REQUESTS) > 0 ;
In above SQL, we are just taking 2 CF commands
WriteAndRegisterMultiple and finding out the average rate in micro seconds. The
SetLockState (SLS) is a small packet and it should bea quick turnaround and gives a baseline for all other messages. The other message
WriteAndRegisterMultiple (WARM) is a larger packet and will include XI time. High WARM times can indicate both heavier network activity and more page invalidation, and measuring WARM times periodically can reveal when statement times are being adversely affected. (Courtesy: Toby Haynes of IBM Lab in Toronto).
Overall CF Wait Rate
The following SQL is the average rate over all CF commands. This is another overall indicator but it has to be compared against a baseline that you collected in a healthy system.
-- AVG_CF_WT_MICRO is an indication if CF is the bottleneck -- This is an average over all CF calls -- Best way to judge good or bad number - Look for a change -- from what is normal for your system SELECT INT(SUM(RECLAIM_WAIT_TIME)) RECLAIMTIME_MILLI, INT(SUM(CF_WAITS)) AS NUM_CF_WAITS, INT(SUM(CF_WAIT_TIME)) CF_WAIT_MILLI, CAST ((CASE WHEN SUM(CF_WAITS) = 0 THEN NULL ELSE (INT(1000 * DEC(SUM(CAST(CF_WAIT_TIME AS FLOAT))/ SUM(CAST(CF_WAITS AS FLOAT)),5,4))) END) AS DECIMAL(10,4)) AVG_CF_WT_MICRO FROM TABLE(SYSPROC.MON_GET_WORKLOAD('',-2)) AS t ;
However, the output from above two SQLs should be seen along with the page negotiation rate. The page negotiation refers to a page getting tossed around due to request made by another member. If the page negotiation rate is much higher, it will lead to performance degradation as more and more time is spent in shuffling / tossing around the page between members through CF. Use the following SQL to determine this rate.
Page Negotiation Rate
-- For example: If Member 'A' acquires a page P and modifies a row on it. -- 'A' holds an exclusive page lock on page until 'A' commits -- Member 'B' wants to modify a different row on the same page. -- 'B' does not have to wait until 'A' commits -- CF will negotiate the page back from 'A' on 'B's behalf. -- Provides better concurrency - Good but excessive can cause -- contention, low CPU usage, reduced throughput. -- Recoomendations to reduce excessive page reclaim -- What is excessive is debatable and sort it in desc order -- for all tables and pick top 5 tables -- If excessive page reclaim then do these -- Consider smaller page size -- For small HOT tables with frequent updates, increase PCTFREE -- PCTFREE will spread rows over more pages -- Side effect - More space consumption (but this is small table anyway) SELECT MEMBER, SUBSTR(TABSCHEMA,1,12) AS SCHEMA, SUBSTR(TABNAME,1,16) AS NAME, SUBSTR(OBJTYPE,1,16) AS TYPE, (PAGE_RECLAIMS_X + PAGE_RECLAIMS_S) AS PAGE_RECLAIMS, RECLAIM_WAIT_TIME FROM TABLE( MON_GET_PAGE_ACCESS_INFO(NULL,NULL, NULL) ) AS WAITMETRICS ORDER BY SCHEMA, NAME ;
The output from the above 3 SQLs can be collected at 15 minute interval plot the time series graph and observe the trend. The increase in SLS and WARM rate by a factor of 2 to 3 with reference to a baseline will indicate that CF network is becoming the bottleneck.
Adding more adapters at the CF will help resolve this bottleneck.
Group Buffer Pool Full Condition
Use the following SQL to determine group buffer pool full condition. This condition will cause a stall condition in which synchronous I/O will run to flush the pages through members to make room in the Group Buffer Pool. This will lead to a burst of the I/O activity. Increasing the CF memory may delay the burst activity.
-- GBP Full Condition -- Good value: < 5 per 10000 transactions -- If higher, GBP is small -- The castout engines might not be keeping up -- SOFTMAX is set too high WITH GBPC AS ( SELECT 10000.0 * SUM(GBP.NUM_GBP_FULL) / SUM(COMMIT_SQL_STMTS) AS GBP_FULL_CONDITION FROM TABLE(MON_GET_GROUP_BUFFERPOOL(-2)) as GBP, SYSIBMADM.SNAPDB ) SELECT CASE WHEN GBP_FULL_CONDITION < 5.0 THEN 'GOOD VALUE' ELSE 'GBP FULL CONDITION' END AS RESULT, CASE WHEN GBP_FULL_CONDITION < 5.0 THEN 'NO GBP FULL CONDITION' ELSE 'INCREASE CF_GBP_SZ OR DECREASE SOFTMAX OR INCREASE NUM_IOSERVERS' END AS RECOMMENDATION FROM GBPC ;
CF Swap Usage
We may not be able to avoid few hundred pages of swap but excessive swapping will adversely affect the performance of the system as a whole. Keep an eye of the CF swap usage. If you are not dipping into swap space, you have kept one problem at bay.
WITH TYPES AS (SELECT NAME FROM SYSIBMADM.ENV_CF_SYS_RESOURCES GROUP BY NAME) SELECT A.ID, SUBSTR(MIN(DECODE(T.NAME, 'HOST_NAME', A.VALUE)),1,36) HOST_NAME, INT(MIN(DECODE(T.NAME, 'MEMORY_TOTAL', A.VALUE)) - MIN(DECODE(T.NAME, 'MEMORY_FREE', A.VALUE))) MEMORY_IN_USE , INT(MIN(DECODE(T.NAME, 'MEMORY_SWAP_TOTAL', A.VALUE)) - MIN(DECODE(T.NAME, 'MEMORY_SWAP_FREE', A.VALUE))) SWAP_IN_USE FROM SYSIBMADM.ENV_CF_SYS_RESOURCES A, TYPES T WHERE A.NAME = T.NAME GROUP BY A.ID ;
Member Swap Usage
WITH TYPES AS (SELECT NAME FROM SYSIBMADM.ENV_SYS_RESOURCES GROUP BY NAME) SELECT SUBSTR(MIN(DECODE(T.NAME, 'HOST_NAME', A.VALUE)),1,36) HOST_NAME, INT(MIN(DECODE(T.NAME, 'MEMORY_TOTAL', A.VALUE)) - MIN(DECODE(T.NAME, 'MEMORY_FREE', A.VALUE))) MEMORY_IN_USE , INT(MIN(DECODE(T.NAME, 'MEMORY_SWAP_TOTAL', A.VALUE)) - MIN(DECODE(T.NAME, 'MEMORY_SWAP_FREE', A.VALUE))) SWAP_IN_USE FROM SYSIBMADM.ENV_SYS_RESOURCES A, TYPES T WHERE A.NAME = T.NAME ;
SAN System Performance
The SAN system performance measurement is not accurately possible using DB2 but we can measure the rate at which pages are flushed to the disk from the buffer pool and from the DB2 logs. This is a crude indication of the performance of the I/O subsystem.
The castout is the term used on z/OS DB2 sysplex, which is similar to page cleaning in DB2 LUW. Calculate number of writes per transaction and calculate time per write. If this time is increasing over 10 ms, it is an indication that I/O subsystem is not able to keep-up.
-- Calculate number of writes / transactions (CASTOUTS_PER_TRANSACTION) -- Calculate time per write (CASTOUT_TIME_MILLI_PER_TRANSACTION) -- -- Bursty write activity may be a sign of SOFTMAX being high -- Long Write Time (CASTOUT_TIME_MILLI_PER_TRANSACTION) is an -- indication that I/O subsystem may not be able to keep up -- -- Castout activity is influenced by -- SOFTMAX - Lower value means faster group crash receovery but more -- aggressive cleaning -- Consider setting SOFTMAX higher than equivalent EE system SELECT CASE WHEN SUM(W.TOTAL_APP_COMMITS) < 100 THEN NULL ELSE CAST( FLOAT(SUM(B.POOL_DATA_WRITES+B.POOL_INDEX_WRITES)) / SUM(W.TOTAL_APP_COMMITS) AS DECIMAL(6,1)) END AS "CASTOUTS_PER_TRANSACTION", CASE WHEN SUM(B.POOL_DATA_WRITES+B.POOL_INDEX_WRITES) < 1000 THEN NULL ELSE CAST( FLOAT(SUM(B.POOL_WRITE_TIME)) / SUM(B.POOL_DATA_WRITES+B.POOL_INDEX_WRITES) AS DECIMAL(5,1)) END AS "CASTOUT_TIME_MILLI_PER_TRANSACTION" FROM TABLE(MON_GET_WORKLOAD(NULL,NULL)) AS W, TABLE(MON_GET_BUFFERPOOL(NULL,NULL)) AS B ;
If the value of CASTOUTS_PER_TRANSACTION is less than 10 and CASTOUT_TIME_MILLI_PER_TRANSACTION is less than 1 ms, the I/O subsystem is keeping well and indicates that it is a modern flash based SAN storage.
Size of Database
select member, decimal(sum(double(tbsp_used_pages) * tbsp_page_size ) / 1024 / 1024, 10, 2 ) as db_mb_used from table( mon_get_tablespace(null, -2)) group by member ;
Detect Log Full Condition
SELECT MEMBER, TOTAL_LOG_AVAILABLE / 1048576 AS LOG_AVAILABLE_MB, TOTAL_LOG_USED / 1048576 AS LOG_USED_MB, CAST (((CASE WHEN (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) = 0 OR (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) IS NULL OR TOTAL_LOG_AVAILABLE = -1 THEN NULL ELSE ((CAST ((TOTAL_LOG_USED) AS DOUBLE) / CAST ( (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) AS DOUBLE))) * 100 END)) AS DECIMAL (5,2)) AS USED_PCT, APPLID_HOLDING_OLDEST_XACT FROM TABLE (MON_GET_TRANSACTION_LOG(-2)) ORDER BY USED_PCT DESC;
Detect Log Hog / Drag Application
select a.application_handle, a.workload_occurrence_state as status, substr(a.session_auth_id,1, 10) as authid, substr(c.application_name, 1, 10) as applname, int(a.uow_log_space_used/1024/1024) as logusedM, timestampdiff(4, char(current timestamp - b.agent_state_last_update_time)) as idleformin from table(mon_get_unit_of_work(NULL,-2)) as a, table(mon_get_agent(NULL,NULL,NULL,-2)) as b, table(mon_get_connection(NULL, -2)) as c where a.application_handle = b.application_handle and a.coord_member = b.member and a.coord_member = a.member and b.agent_type = 'COORDINATOR' and a.uow_stop_time is null and a.application_handle = c.application_handle and a.coord_member = c.member and b.event_type = 'WAIT' and b.event_object = 'REQUEST' and b.event_state = 'IDLE';