I just wrapped a very large PoC for BLU with DPF with very high ingestion rate (110K messages per second) and as well as very high volume point queries (1500 queries per second) and I used the following queries to measure the various parameters and optimum tuning.

The STMM is off by default for DPF with BLU but we should take advantage of the some automation.

Rule – 1 : If you are using more than one MLN (Multiple Logical Node) per physical host (or LPAR), divide total memory of the host by the number of MLN and then do the calculation. Say for example: If each host has 256 GB RAM and decide to use 2 MLN per host then the available memory per MLN is 128 GB. (This is also misnomer – the actual memory in Linux is usually less than what is the installed memory and we should go with what free command says as available memory and not the one that is installed –> For example: The installed memory is 256 GB but available that Linux reports is only 251 GB so I would divide 251 GB with MLN say 2.)

Rule – 2: Keep INSTANCE_MEMORY to 90-95% of the available memory to the MLN. Keep INSTANCE_MEMORY to NON-automatic.

Rule – 3: Keep DATABASE_MEMORY to 90% of the INSTANCE_MEMORY. Keep DATABASE_MEMORY to AUTOMATIC. {Please note this.}

Rule – 4: Keep 40% of the DATABASE_MEMORY to SHEAPTHRES_SHR and SORTHEAP to 1/20 of the SHEAPTHRES_SHR. Now, for my workload, I had to the keep SHEAPTHRES_SHR to 60% and the ratio to 1/5 as my system was not having sufficient memory to process a very large data set.

Rule – 5: If continuous LOAD is performed, keep UTIL_HEAP_SZ to 5000 pages and AUTOMATIC and let db2 adjust the memory. Their is no need to bump this up if UTIL_HEAP_SZ and DATABASE_MEMORY are AUTOMATIC (even though STMM is turned off)

Ruke – 6: Keep buffer pool 40% of the DATABASE_MEMORY and keep it NON-AUTOMATIC. Again adjust this size based upon query performance requirement.

Monitoring queries for BLU with DPF.

 $ cat mondpf
#!/bin/bash

SECONDS=25200

db2 connect to <database>

endTime=$(($(date +%s) + SECONDS))
while [ $(date +%s) -lt $endTime ]
do
 echo Skew report;
 db2 -tf skew.sql
 echo Sort info;
 db2 -tf sort.sql
 echo logfull info;
 db2 -tf logfull.sql
 echo BP hit ratio;
 db2 -tf bp.sql
 echo Members info;
 db2 -tf members.sql
 sleep 60
 #db2 "call monreport.dbsummary(60)"
done

This skew script is very important in a multi-member (either pureScale or DPF) to determine if any of the member is slow in writing or reading the information from this disk. No matter how much monitoring you do for a FC card performance, you can’t make out anything just by seeing the values in isolation if a particular is slow in writing. This is one of most difficult problem to solve. It took me 2 weeks to figure this out and this SQL was the saver through which we found out the members that were slow in writing and reading –> This led us to change the FC cards and when this was done, the performance gain was 4 times. Through this script – by comparing the  direct_write_time of every member, we determined which member FC cards were slow in writing. It was a big help.

 $ cat skew.sql
select member,
 bp_cur_buffsz,
 direct_writes,
 pool_data_writes,
 direct_write_time,
 pool_write_time,
 pool_col_writes,
 pool_async_col_writes
from TABLE(MON_GET_BUFFERPOOL('',-2))
where bp_name in ('MINBP','MAXBP')
order by member
;

In BLU with DPF, the maximum performance gain occurs by having a properly sized SHEAPTHRES_SHR and the ratio between SHEAPTHRES_SHR and SORTHEAP. A proper tuning and adjusting the values will be helpful but know through this script of sort overflows are occurring and minimizing that by increasing SHEAPTHRES_SHR.

 $ cat sort.sql
echo Monitor Sort Memory Usage - Obtain current and maximum sort usage for the database;

SELECT MEMBER, SORT_SHRHEAP_ALLOCATED,
 SORT_SHRHEAP_TOP
FROM TABLE(MON_GET_DATABASE(-2))
ORDER BY MEMBER
;

echo Extract percentage of sort operations that have spilled and high watermark sort usage;

WITH ops AS (
SELECT
 member, (total_sorts + total_hash_joins + total_hash_grpbys)as sort_ops,
 (sort_overflows + hash_join_overflows + hash_grpby_overflows) as overflows,
 sort_shrheap_top as sort_heap_top
FROM TABLE (mon_get_database(-2))
) SELECT member, sort_ops,
 overflows,
 (overflows * 100) / nullif(sort_ops,0) as pctoverflow,
 sort_heap_top
from ops
order by member;

The log full script is the one each DBA must keep in their pocket all the time. I have seen 100s of DBAs who will just increase the LOGPRIMARY and LOGSECOND whenever they run into LOG FULL condition without realizing as what is causing this. Find out the offending application that is holding a LOG file hostage and thus leading to LOG FULL condition. I have covered this in detail somewhere also.

 $ cat logfull.sql
SELECT MEMBER,
 TOTAL_LOG_AVAILABLE / 1048576 AS LOG_AVAILABLE_MB,
 TOTAL_LOG_USED / 1048576 AS LOG_USED_MB,
 CAST (((CASE WHEN (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) = 0
 OR (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) IS NULL
 OR TOTAL_LOG_AVAILABLE = -1 THEN NULL
 ELSE ((CAST ((TOTAL_LOG_USED) AS DOUBLE) / CAST (
 (TOTAL_LOG_AVAILABLE + TOTAL_LOG_USED) AS DOUBLE))) * 100
 END)) AS DECIMAL (5,2)) AS USED_PCT,
 APPLID_HOLDING_OLDEST_XACT
FROM TABLE (MON_GET_TRANSACTION_LOG(-2))
ORDER BY member, USED_PCT DESC;

The buffer pool hit ratio is meaningless to me. It would be good 99.9% of the time and then why do we measure it? The key is to see if we have enough pool size or not. I am still searching for the silver bullet. This is least to monitor in my perspective but still including the script.

 $ cat bphit.sql
WITH BPMETRICS AS (
 SELECT bp_name,
 pool_data_l_reads + pool_temp_data_l_reads +
 pool_index_l_reads + pool_temp_index_l_reads +
 pool_xda_l_reads + pool_temp_xda_l_reads as logical_reads,
 pool_data_p_reads + pool_temp_data_p_reads +
 pool_index_p_reads + pool_temp_index_p_reads +
 pool_xda_p_reads + pool_temp_xda_p_reads as physical_reads,
 member
 FROM TABLE(MON_GET_BUFFERPOOL('',-2)) AS METRICS)
 SELECT
 VARCHAR(bp_name,20) AS bp_name,
 logical_reads,
 physical_reads,
 CASE WHEN logical_reads > 0
 THEN DEC((1 - (FLOAT(physical_reads) / FLOAT(logical_reads))) * 100,5,2)
 ELSE NULL
 END AS HIT_RATIO,
 member
 FROM BPMETRICS;

This script requires that you create a workload for the purpose of the monitoring. This is the best way to find out performance of a client application at a member level. I have covered elsewhere how to do this in detail.

 $ cat members.sql
select substr(host_name,1,20) host, 
       t.member, act_completed_total + act_aborted_total AS TOTAL_TRANS,
       cast(total_act_time/1000.0 as decimal(16,2)) ACT_TIME_SECS 
from table(mon_get_workload('Your Work Load Name',-2)) as t,
     table (env_get_system_resources()) m 
where m.member = t.member 
order by host, t.member;

I like to run as few scripts as possible for the purpose of monitoring and let the database do the job rather than to monitor every aspect of it – which is not a good idea once you have tuned the system. The monitoring has a drag in the same fashion as 20 managers keep asking a high performing employee every hour the status of the work. I have seen this in all major corporations that I have visited in last 14 years. We have too many status seeking managers than high performing employees now-a-days. Does it sound right?